Âé¶¹ÒùÔº


Using geometry and physics to explain feature learning in deep neural networks

Using geometry and physics to explain feature learning in deep neural networks
A handmade folding ruler analogy, which the team found could be used to model DNN training in different regimes. Credit: Shi, Pan & Dokmanic.

Deep neural networks (DNNs), the machine learning algorithms underpinning the functioning of large language models (LLMs) and other artificial intelligence (AI) models, learn to make accurate predictions by analyzing large amounts of data. These networks are structured in layers, each of which transforms input data into 'features' that guide the analysis of the next layer.

The process through which DNNs learn features has been the topic of numerous research studies and is ultimately the key to these models' good performance on a variety of tasks. Recently, some computer scientists have started exploring the possibility of modeling feature learning in DNNs using frameworks and approaches rooted in physics.

Researchers at the University of Basel and the University of Science and Technology of China discovered a , a graph resembling those used in thermodynamics to delineate liquid, gaseous and solid phases of water, that represents how DNNs learn features under various conditions. Their paper, in Âé¶¹ÒùÔºical Review Letters, models a DNN as a spring-block chain, a simple mechanical system that is often used to study interactions between linear (spring) and nonlinear (friction) forces.

"Cheng and I were at a workshop where there was an inspiring talk on 'a law of data separation,'" Ivan Dokmanić, the researcher who led the study, told Âé¶¹ÒùÔº. "The layers of a deep neural network (but also of such as the human visual cortex) process inputs by progressively distilling and simplifying them.

"The deeper you are in the network, the more regular, more geometric these representations become, which means that representations of different classes of objects (e.g., representations of cats and dogs) become more separate and easier to distinguish. There's a way to measure this separation.

"The talk showed that in well-trained neural nets it often happens that these data separation 'summary statistics' behave in a simple way, even for very complicated trained on complicated data: each layer improves separation by the same amount."

The team found that the 'law of data separation' held true in networks with commonly used 'hyperparameters," such as learning rate and noise, but not for different hyperparameter choices. They realized that understanding why this happens could shed light on how DNNs learn good features across models. They thus set out to find a suitable theoretical description of these intriguing findings.

"At the same time, we were involved in some projects in geophysics where people use spring-block models as phenomenological models of fault and earthquake dynamics," said Dokmanić. "The data separation phenomenology reminded us of that. We thought about many other analogies. For instance, Cheng thought that equal data separation is a bit like a retractable coat hanger; I thought it's a bit like a folding ruler.

"We spent that winter holiday exchanging pictures and videos of various 'layer-structured' household items and tools, including these coat hangers, folding rulers, etc. I remember discussing whether a certain stretch trivet is a good model for a famous deep neural net called a ResNet."

After identifying various potential theoretical models and layered that could be used to study how DNNs learn features, the researchers ultimately decided to focus on spring block models. These models have already proved valuable for studying a wide range of real-world phenomena, including earthquakes and the deformation of materials.

Using geometry and physics to explain feature learning in deep neural networks
Figure representing the team's spring-block theory of feature learning in deep neural networks. Credit: Shi, Pan & Dokmanic.

"We showed that the behavior of this data separation is eerily similar to the behavior of blocks connected by springs which are sliding on a rough surface (but also to the behavior of other mechanical systems, such as folding rulers)," explained Dokmanić.

"How much a layer simplifies the corresponds to how much a spring extends. Nonlinearity in the network corresponds to how much friction there is between the blocks and the surface. In both systems we can add noise."

When looking at the two systems in the context of the law of data separation, Dokmanić and his colleagues found that the behavior of DNNs was similar to that of spring-block chains. A DNN responds to the training loss (i.e., request to explain observed data) by separating data layer by layer. Similarly, a spring-block chain responds to a pulling force by separating the blocks layer by layer.

"The more nonlinearity there is, the more discrepancy there is between the outer (deep) and inner (shallow) layers: the deep layers learn / separate more; same for springs," said Dokmanić.

"However, if we add training noise or start shaking / vibrating the spring–block system, then blocks will spend some time 'in the air,' without experiencing friction, and this will allow the springs to equalize the separation a bit. It's actually similar to 'acoustic lubrication' in process engineering, as well as to certain stick-slip phenomena in geophysics."

This recent study introduces a new theoretical approach to study DNNs and how they learn features over time. In the future, this approach could help to deepen the present understanding of deep learning algorithms and the processes through which they learn to reliably tackle specific tasks.

"Most existing results treat simplified networks that are missing key aspects of real deep nets used in practice—either depth or nonlinearity or something else," explained Dokmanić.

"These works study a single impact factor on a stylized model, but the success of deep nets is predicated on an accumulation of factors (depth, nonlinearity, noise, learning rate, normalization, …). In contrast, we took a top-down approach which is phenomenological, not first-principles, but we obtain a general theory, an understanding of the interplay of all these things."

The spring-block theory employed by the researchers was so far found to be both simple and effective for understanding the ability of DNNs to generalize across different scenarios. In their paper, Dokmanić and his colleagues successfully used it to compute the data separation curves of DNNs during training and found that the shape of these curves is indicative of the performance of the trained network on unseen data.

Amusing videos of folding ruler experiments and DNN training in different regimes. Credit: Âé¶¹ÒùÔºical Review Letters (2025). DOI: 10.1103/ys4n-2tj3

"Since we also understand how to change the shape of the data separation curve in either direction by varying noise and nonlinearity, this gives us a (potentially) powerful tool to speed up training of very large nets," said Dokmanić.

"Most people have strong intuitions about springs and blocks but not about deep neural nets. Our theory says that we can make interesting, useful, true statements about deep nets by levering our intuition about a simple mechanical system. That's great because neural nets have billions of parameters, but our spring block system only has a handful."

The theoretical model employed by this team of researchers could soon be used by both theorists and computer scientists to further investigate the underpinnings of deep learning-based algorithms. As part of their next studies, Dokmanić and his colleagues hope to also use their theoretical approach to explore feature learning from a microscopic standpoint.

"We're close to having a first-principles explanation for the spring-block phenomenology (or perhaps the folding ruler phenomenology) in deep, nonlinear networks give or take a few approximations," explained Dokmanić.

"The other direction we are pursuing is to really double down on how to operationalize this to improve deep net training, especially for very large transformer-based networks like large language models. Having a proxy for generalization that is cheap to compute at training time, and an understanding of how to steer training to improve generalization, is a sort of a holy grail, an alternative route to the currently very popular scaling laws."

By understanding how the training of DNNs can be carefully engineered to improve their ability to generalize across other tasks, the researchers could also devise a diagnostic tool for large neural networks. For instance, this tool might help to identify areas that need to be improved to boost a model's performance, similarly to how stress maps are used in structural mechanics to identify regions of concentrated stress that could compromise the safety of structures.

"By analyzing the internal load distribution in a neural net, we can find layers / regions that are overloaded which may indicate overfitting and hurt generalization, or layers that are barely used, indicating redundancy," added Dokmanić.

Written for you by our author , edited by , and fact-checked and reviewed by —this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a (especially monthly). You'll get an ad-free account as a thank-you.

More information: Cheng Shi et al, Spring-Block Theory of Feature Learning in Deep Neural Networks, Âé¶¹ÒùÔºical Review Letters (2025). .

© 2025 Science X Network

Citation: Using geometry and physics to explain feature learning in deep neural networks (2025, August 10) retrieved 10 August 2025 from /news/2025-08-geometry-physics-feature-deep-neural.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further


51 shares

Feedback to editors