Using geometry and physics to explain feature learning in deep neural networks

August 10, 2025 feature

Using geometry and physics to explain feature learning in deep neural networks

by , 麻豆淫院

Deep neural networks (DNNs), the machine learning algorithms underpinning the functioning of large language models (LLMs) and other artificial intelligence (AI) models, learn to make accurate predictions by analyzing large amounts of data. These networks are structured in layers, each of which transforms input data into 'features' that guide the analysis of the next layer.

The process through which DNNs learn features has been the topic of numerous research studies and is ultimately the key to these models' good performance on a variety of tasks. Recently, some computer scientists have started exploring the possibility of modeling feature learning in DNNs using frameworks and approaches rooted in physics.

Researchers at the University of Basel and the University of Science and Technology of China discovered a phase diagram, a graph resembling those used in thermodynamics to delineate liquid, gaseous and solid phases of water, that represents how DNNs learn features under various conditions. Their paper, in 麻豆淫院ical Review Letters, models a DNN as a spring-block chain, a simple mechanical system that is often used to study interactions between linear (spring) and nonlinear (friction) forces.

"Cheng and I were at a workshop where there was an inspiring talk on 'a law of data separation,'" Ivan Dokmani膰, the researcher who led the study, told 麻豆淫院. "The layers of a deep neural network (but also of biological neural networks such as the human visual cortex) process inputs by progressively distilling and simplifying them.

"The deeper you are in the network, the more regular, more geometric these representations become, which means that representations of different classes of objects (e.g., representations of cats and dogs) become more separate and easier to distinguish. There's a way to measure this separation.

"The talk showed that in well-trained neural nets it often happens that these data separation 'summary statistics' behave in a simple way, even for very complicated deep neural networks trained on complicated data: each layer improves separation by the same amount."

The team found that the 'law of data separation' held true in networks with commonly used 'hyperparameters," such as learning rate and noise, but not for different hyperparameter choices. They realized that understanding why this happens could shed light on how DNNs learn good features across models. They thus set out to find a suitable theoretical description of these intriguing findings.

"At the same time, we were involved in some projects in geophysics where people use spring-block models as phenomenological models of fault and earthquake dynamics," said Dokmani膰. "The data separation phenomenology reminded us of that. We thought about many other analogies. For instance, Cheng thought that equal data separation is a bit like a retractable coat hanger; I thought it's a bit like a folding ruler.

"We spent that winter holiday exchanging pictures and videos of various 'layer-structured' household items and tools, including these coat hangers, folding rulers, etc. I remember discussing whether a certain stretch trivet is a good model for a famous deep neural net called a ResNet."

After identifying various potential theoretical models and layered physical systems that could be used to study how DNNs learn features, the researchers ultimately decided to focus on spring block models. These models have already proved valuable for studying a wide range of real-world phenomena, including earthquakes and the deformation of materials.

"We showed that the behavior of this data separation is eerily similar to the behavior of blocks connected by springs which are sliding on a rough surface (but also to the behavior of other mechanical systems, such as folding rulers)," explained Dokmani膰.

Discover the latest in science, tech, and space with over 100,000 subscribers who rely on 麻豆淫院 for daily insights. Sign up for our and get updates on breakthroughs, innovations, and research that matter鈥�daily or weekly.

"How much a layer simplifies the data corresponds to how much a spring extends. Nonlinearity in the network corresponds to how much friction there is between the blocks and the surface. In both systems we can add noise."

When looking at the two systems in the context of the law of data separation, Dokmani膰 and his colleagues found that the behavior of DNNs was similar to that of spring-block chains. A DNN responds to the training loss (i.e., request to explain observed data) by separating data layer by layer. Similarly, a spring-block chain responds to a pulling force by separating the blocks layer by layer.

"The more nonlinearity there is, the more discrepancy there is between the outer (deep) and inner (shallow) layers: the deep layers learn / separate more; same for springs," said Dokmani膰.

"However, if we add training noise or start shaking / vibrating the spring鈥揵lock system, then blocks will spend some time 'in the air,' without experiencing friction, and this will allow the springs to equalize the separation a bit. It's actually similar to 'acoustic lubrication' in process engineering, as well as to certain stick-slip phenomena in geophysics."

This recent study introduces a new theoretical approach to study DNNs and how they learn features over time. In the future, this approach could help to deepen the present understanding of deep learning algorithms and the processes through which they learn to reliably tackle specific tasks.

"Most existing results treat simplified networks that are missing key aspects of real deep nets used in practice鈥攅ither depth or nonlinearity or something else," explained Dokmani膰.

"These works study a single impact factor on a stylized model, but the success of deep nets is predicated on an accumulation of factors (depth, nonlinearity, noise, learning rate, normalization, 鈥�). In contrast, we took a top-down approach which is phenomenological, not first-principles, but we obtain a general theory, an understanding of the interplay of all these things."

The spring-block theory employed by the researchers was so far found to be both simple and effective for understanding the ability of DNNs to generalize across different scenarios. In their paper, Dokmani膰 and his colleagues successfully used it to compute the data separation curves of DNNs during training and found that the shape of these curves is indicative of the performance of the trained network on unseen data.

Amusing videos of folding ruler experiments and DNN training in different regimes. Credit: 麻豆淫院ical Review Letters (2025). DOI: 10.1103/ys4n-2tj3

"Since we also understand how to change the shape of the data separation curve in either direction by varying noise and nonlinearity, this gives us a (potentially) powerful tool to speed up training of very large nets," said Dokmani膰.

"Most people have strong intuitions about springs and blocks but not about deep neural nets. Our theory says that we can make interesting, useful, true statements about deep nets by levering our intuition about a simple mechanical system. That's great because neural nets have billions of parameters, but our spring block system only has a handful."

The theoretical model employed by this team of researchers could soon be used by both theorists and computer scientists to further investigate the underpinnings of deep learning-based algorithms. As part of their next studies, Dokmani膰 and his colleagues hope to also use their theoretical approach to explore feature learning from a microscopic standpoint.

"We're close to having a first-principles explanation for the spring-block phenomenology (or perhaps the folding ruler phenomenology) in deep, nonlinear networks give or take a few approximations," explained Dokmani膰.

"The other direction we are pursuing is to really double down on how to operationalize this to improve deep net training, especially for very large transformer-based networks like large language models. Having a proxy for generalization that is cheap to compute at training time, and an understanding of how to steer training to improve generalization, is a sort of a holy grail, an alternative route to the currently very popular scaling laws."

By understanding how the training of DNNs can be carefully engineered to improve their ability to generalize across other tasks, the researchers could also devise a diagnostic tool for large neural networks. For instance, this tool might help to identify areas that need to be improved to boost a model's performance, similarly to how stress maps are used in structural mechanics to identify regions of concentrated stress that could compromise the safety of structures.

"By analyzing the internal load distribution in a neural net, we can find layers / regions that are overloaded which may indicate overfitting and hurt generalization, or layers that are barely used, indicating redundancy," added Dokmani膰.

Written for you by our author 鈥攖his article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a (especially monthly). You'll get an ad-free account as a thank-you.

More information: Cheng Shi et al, Spring-Block Theory of Feature Learning in Deep Neural Networks, 麻豆淫院ical Review Letters (2025). .

Journal information: 麻豆淫院ical Review Letters

漏 2025 Science X Network

Citation: Using geometry and physics to explain feature learning in deep neural networks (2025, August 10) retrieved 19 October 2025 from /news/2025-08-geometry-physics-feature-deep-neural.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

麻豆淫院