A new AI framework can be used to find space physics equations in raw data. Finding physical relationships and the symbolic expressions (i.e., mathematical formulas) that describe them is one of the many potential uses for AI systems. Physicists currently have to do a lot of raw data analysis to find these formulas, so automating this process could be very beneficial.
Physicists currently have to do a lot of raw data analysis to find these formulas, so automating this process could be very beneficial.
An AI framework that can automatically extract symbolic physical representations from raw data has been developed by researchers at Tsinghua University, Peking University, and other Chinese institutions. In a paper published in Nature Machine Intelligence, this novel model—dubbed PhyE2E—was presented.
Researchers from Tsinghua University, Peking University, and other Chinese universities have created an AI system that can automatically extract symbolic physical representations from raw data. This new model, called PhyE2E, was introduced in a paper published in Nature Machine Intelligence.
“Our goal was to push AI beyond curve-fitting and toward human-understandable discovery: returning compact, unit-consistent equations that scientists can read, test, and build on,” Yuan Zhou, co-senior author of the paper, told Phys.org.
In order to confirm whether the learned equations truly correspond to nature, we first focused on space physics, where a wealth of meticulously chosen observational data was available. We expect other sciences to adopt the general technique.
A model that represents physics data using symbols
Zhou and his colleagues introduced PhyE2E, a novel AI framework that was taught using physical data and mathematical equations. In order to learn what plausible physics-related formulae “look like,” the model was trained by fine-tuning well-known physics equations. It then synthesized multiple unit-consistent variations to create new ones.
“PhyE2E directly converts data into a symbolic expression and its units using a transformer,” Zhou explained.
It uses a divide-and-conquer approach that looks at second-order derivatives of a lightweight “oracle” network to decompose a complicated problem into smaller sub-formulas. After that, a brief MCTS/GP refinement is performed to clean up the structure and constants. The result is a compact, understandable, and dimensionally consistent equation.
In their most recent study, the researchers evaluated their framework using actual astrophysical data gathered by NASA as well as synthetic data produced by a large language model (LLM).
In the end, they were able to extract formulas that described physical relationships in data pertaining to five actual space-physics scenarios. Interestingly, the formulas it came up with either matched those of human physicists or seemed to be even more accurate representations of the data.
For example, the model achieved an improved formula that provides a mathematical explanation of solar cycles when NASA data from 1993 was analyzed. Furthermore, it successfully depicted the connections between magnetic fields, temperature, and solar radiation.
A potentially practical tool for scientific investigation
The new AI model developed by this research team essentially learns to break down complex physics problems into their more basic parts. It can then use existing, well-established equations to generate new formulas that accurately represent the relationship between different variables.
“Writing a lengthy expression that interpolates the data is trivial, and it may be tempting to favor very short ones, but neither ensures physical meaning—many candidate formulas even violate dimensional (unit) consistency,” Zhou said.
A prior over known, unit-consistent equations must be learned. The system then suggests compact, physically realistic forms that communicate true insight after we refine this prior. We see this as an initial step toward abstracting and extending scientific experience to enable automated discovery.
Other experimental and astrophysical data may soon be analyzed using PhyE2E, potentially producing formulas that more accurately characterize particular physical interactions or occurrences. It might also be modified and used in other domains in the future, which could lead to scientific advancements in several areas.
“We’re now extending the framework to calculus-aware operators (e.g., derivatives/integrals for PDE-style laws), strengthening robustness to noisier laboratory data,” Zhou stated.
In a broader sense, our research’s main goal is to develop a neuro-symbolic technique so that deep neural network predictions may be understood. Simultaneously, we believe that incorporating explainability as a design concept can improve an AI system’s ability to find more precise and trustworthy scientific rules.”