Âé¶¹ÒùÔº

June 16, 2025

Machine learning method improves accuracy of inverse protein folding for drug design

Mask-prior-guided denoising diffusion (MapDiff) for inverse protein folding. Credit: Nature Machine Intelligence (2025). DOI: 10.1038/s42256-025-01042-6
× close
Mask-prior-guided denoising diffusion (MapDiff) for inverse protein folding. Credit: Nature Machine Intelligence (2025). DOI: 10.1038/s42256-025-01042-6

An AI approach developed by researchers from the University of Sheffield and AstraZeneca, could make it easier to design proteins needed for new treatments.

In their study in the journal Nature Machine Intelligence, Sheffield computer scientists in collaboration with AstraZeneca and the University of Southampton have developed a new machine learning framework that has shown the potential to be more accurate at inverse than existing state-of-the-art methods.

Inverse protein folding is a critical process for creating novel proteins. It is the process of identifying , the building blocks of proteins, that fold into a desired 3D protein structure and enable the protein to perform specific functions.

Protein engineering plays a critical role in by designing proteins that can bind to specific targets in the body. However, the process is challenging due to the complexity of protein folding and the difficulty in predicting how amino acid sequences will interact to form functional structures.

Scientists have turned to machine learning to more accurately predict which amino acid sequences will fold into stable, functional protein structures. These models are trained on large datasets of known and structures to improve inverse folding predictions.

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

The new machine learning framework, called MapDiff, from the University of Sheffield, AstraZeneca and the University of Southampton, outperformed the most state-of-the-art AI in making successful predictions in simulated tests. The results are a promising basis to develop the technology further, which, if successful, could accelerate the design of the key proteins needed to develop new vaccines and gene therapies, and other therapeutic modalities.

It also complements other recent advances, such as AlphaFold, which predicts a protein's 3D structure by reversing the approach, starting with the protein fold and retrieving the potential amino acid sequences.

Haiping Lu, Professor of Machine Learning at the University of Sheffield and the corresponding author of the study, said, "This work represents a significant step forward in using AI to design proteins with desired structures. By learning how to generate amino acid sequences that are likely to fold into specific 3D structures, our method opens new possibilities for designing new therapeutic proteins, which can be used in various therapeutic applications. It's exciting to see AI helping us tackle such a fundamental challenge in biology."

Peizhen Bai, Senior Machine Learning Scientist at AstraZeneca, who developed the AI as part of his Ph.D. at the University of Sheffield's School of Computer Science, said, "During my Ph.D., I was motivated by the potential of AI to accelerate biological discovery. I'm proud that our method, MapDiff, helps design protein sequences that are more likely to fold into desired 3D structures—a key step towards advancing next-generation therapeutics."

More information: Peizhen Bai et al, Mask-prior-guided denoising diffusion improves inverse protein folding, Nature Machine Intelligence (2025).

Journal information: Nature Machine Intelligence

Load comments (0)

This article has been reviewed according to Science X's and . have highlighted the following attributes while ensuring the content's credibility:

fact-checked
peer-reviewed publication
trusted source
proofread

Get Instant Summarized Text (GIST)

A new machine learning framework, MapDiff, demonstrates improved accuracy in inverse protein folding compared to existing methods. By more effectively predicting amino acid sequences that fold into specific 3D structures, this approach may accelerate the design of therapeutic proteins for drug development, vaccines, and gene therapies.

This summary was automatically generated using LLM.