Âé¶¹ÒùÔº


New AI model for drug design brings more physics to bear in predictions

New AI model for drug design brings more physics to bear in predictions
(A) Illustration of the atomic nucleus and the geometric manifold of an atom. The manifold represents the spatial boundary defined by the van der Waals radius, which sets the minimum distance between atomic nuclei. (B) Illustration of the manifold surrounding a molecule. (C) Illustration of the mesh points obtained from discretizing a manifold. (D) Pipeline of NucleusDiff. NucleusDiff performs denoising diffusion on both the nuclei and the discretized mesh points, where the distances between them approximate the van der Waals radii. Credit: Proceedings of the National Academy of Sciences (2025). DOI: 10.1073/pnas.2415666122

When machine learning is used to suggest new potential scientific insights or directions, algorithms sometimes offer solutions that are not physically sound.

Take, for example, AlphaFold, the AI system that predicts the complex ways in which amino acid chains will fold into 3D protein structures. The system sometimes suggests "unphysical" folds—configurations that are implausible based on the —especially when asked to predict the folds for chains that are significantly different from its .

To limit this type of unphysical result in the realm of drug design, Anima Anandkumar, Bren Professor of Computing and Mathematical Sciences at Caltech, and her colleagues have introduced a new machine learning model called NucleusDiff, which incorporates a simple physical idea into its training, greatly improving the algorithm's performance.

Anandkumar and her colleagues describe NucleusDiff in a that appears as part of a "Machine Learning in Chemistry" special feature published in Proceedings of the National Academy of Sciences.

The goal in structure-based drug design is to come up with , called ligands, that will bind well to a biological target, typically a protein, causing some kind of desired change in activity. Drug-design AI models are trained on datasets containing tens of thousands of examples of such protein–ligand pairings as well as information about how well they latch on to each other, an important measurement called binding affinity. But importantly, NucleusDiff goes a step further.

"With machine learning, the model is already learning many of the aspects of what makes for good binding, and now we throw in some simple physics to make sure we rule out all the unphysical things," Anandkumar explains.

In the case of NucleusDiff, the model ensures that atoms stay at an appropriate distance from one another, accounting for physical concepts such as repellant forces that prevent atoms from overlapping or colliding.

"We have some nice physical theory behind the algorithm, but it's also intuitive," Anandkumar says. "Surprisingly, without these constraints, all these AI models tend to predict that there is collision, that the atoms come too close. By adding simple physics, we increased the model's accuracy."

Rather than accounting for the distance between every single pair of atoms in a molecule (a task that would be prohibitively computationally expensive), NucleusDiff estimates a manifold, or envelope—a rough estimation of the distribution of atoms and the probable locations of electrons in the molecule. On that manifold, it then establishes main anchoring points to watch, making sure that the atoms never get too close to one another.

The team trained NucleusDiff on a training dataset called CrossDocked2020, which includes about 100,000 protein–ligand binding complexes. They tested it on 100 of those complexes and found that it significantly outperformed state-of-the-art models in terms of binding affinity while also reducing the number of atomic collisions to almost zero.

Next, the researchers used the new model to predict binding affinities of a newer molecule that was not included in the training dataset: the COVID-19 therapeutic target 3CL protease. Again, NucleusDiff showed increased accuracy and a reduction of atomic collisions by up to two-thirds as compared to other leading models.

The work fits within a larger push on campus by Anandkumar and others, through an initiative called AI4Science, to integrate more physics into data-driven AI models built for a variety of topics—from climate prediction to robotics and from seismology to astrophysical modeling.

"If we rely purely on training data, we do not expect machine learning to work well on examples that are significantly different from the training data," Anandkumar says.

In fact, she says, it is a standard principle of machine learning that the outputs typically fall within the realm of the examples provided in the training data. But in many scientific domains like drug design, researchers are looking for novel results (e.g., new molecules).

"We see a lot of machine learning fail in coming up with accurate results on new examples that are different from training data, but by incorporating physics, we can make more trustworthy and also work much better," says Anandkumar.

More information: Shengchao Liu et al, Manifold-constrained nucleus-level denoising diffusion model for structure-based drug design, Proceedings of the National Academy of Sciences (2025).

Citation: New AI model for drug design brings more physics to bear in predictions (2025, October 20) retrieved 20 October 2025 from /news/2025-10-ai-drug-physics.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Scientist tackles key roadblock for AI in drug discovery

2 shares

Feedback to editors