Âé¶¹ÒùÔº

January 21, 2025

Simplified redesign of proteins can improve ligand binding

Overview of the proposed framework. The process begins with utilizing a protein amino acid sequence and a ligand SMILES string as inputs. The joint sequence and structural diffusion process include input featurization, residual feature updates, and equivariant denoiser, ultimately yielding novel protein sequences alongside their corresponding Cα protein backbone (gray) and ligand (red) in 3D complexes. Credit: Structural Dynamics (2024). DOI: 10.1063/4.0000271
× close
Overview of the proposed framework. The process begins with utilizing a protein amino acid sequence and a ligand SMILES string as inputs. The joint sequence and structural diffusion process include input featurization, residual feature updates, and equivariant denoiser, ultimately yielding novel protein sequences alongside their corresponding Cα protein backbone (gray) and ligand (red) in 3D complexes. Credit: Structural Dynamics (2024). DOI: 10.1063/4.0000271

In biology, the binding of cellular proteins to molecules called ligands produces myriad functions essential for life, including cell signaling and enzymatic action. In biotechnology and medicine, the ability of researchers to alter proteins to refine control over binding affinity and specificity can create tailored therapeutics with reduced side effects, highly sensitive diagnostic tools, efficient biocatalysis, targeted drug delivery systems and sustainable bioremediation solutions.

Various approaches to such protein redesign have drawbacks. Traditional methods include time-consuming trial and error efforts, and many models in the emerging field of computational design demand extensive information about the and the pocket where a ligand binds.

Researchers led by Truong Son Hy, Ph.D., from the University of Alabama at Birmingham, offer a simplified method they call ProteinReDiff, which uses to speed the redesign of ligand-binding proteins.

ProteinReDiff stands for Protein Redesign based on Diffusion Models, and it incorporates key improvements inspired by the representation learning modules from the AlphaFold2 architecture of computer-based protein folding. These modules allow the ProteinReDiff framework to capture intricate protein–ligand interactions, improve the fidelity of binding affinity predictions and enable more precise redesigns of ligand-binding proteins.

The work is in the journal Structural Dynamics, as part of a special topic on Artificial Intelligence and Structural Science.

"Our framework enables the design of high-affinity ligand-binding proteins without reliance on detailed structural information," said Hy, an assistant professor in the UAB Department of Computer Science. "We rely solely on initial protein sequences and ligand SMILES strings."

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

SMILES, the Simplified Molecular Input Line Entry System, is a longstanding specification of the structure of molecules using only computer-readable ASCII characters.

"A key feature of our method is blind docking, which predicts how the redesigned protein interacts with its ligand without the need for predefined binding site information," Hy said. "This streamlined approach significantly reduces reliance on detailed structural data, thus expanding the scope for sequence-based exploration of protein-ligand interactions."

The researchers—including Viet Thanh Duy Nguyen, FPT Software AI Center, Ho Chi Minh City, Vietnam, and Nhan D. Nguyen, University of Chicago, trained the artificial intelligence framework ProteinReDiff on numerous known structures of proteins and their binding . They then were able to redesign selected protein-ligand pairs by stochastically masking and equivariantly denoising the to capture the joint distribution of ligand and protein complex conformations.

Hy and colleagues compared ProteinReDiff against eight other computational protein design models based on input and output characteristics and improved ligand-binding of proteins from selected ligand-protein pairs.

With regard to input characteristics, six of the eight comparison models relied on protein structure information as one of the inputs; only ProteinReDiff and a model called DPL relied solely on and ligand SMILES inputs. With regard to outputs, only ProteinReDiff produced new protein designs that included protein sequence, protein structure and ligand structure.

With regard to performance, redesigned proteins from selected protein-ligand pairs produced by ProteinReDiff and the eight other protein design models were compared for ligand binding affinity, amino acid sequence diversity and structure preservation. ProteinReDiff produced superior improvement in ligand binding affinity, compared to the other models.

"Our model excels in optimizing ligand binding affinity based solely on initial protein sequences and ligand SMILES strings, bypassing the need for detailed structural data," Hy said. "These findings open new possibilities for protein-ligand complex modeling, indicating significant potential for ProteinReDiff in various biotechnological and pharmaceutical applications."

More information: Viet Thanh Duy Nguyen et al, ProteinReDiff: Complex-based ligand-binding proteins redesign by equivariant diffusion-based generative models, Structural Dynamics (2024).

Load comments (0)

This article has been reviewed according to Science X's and . have highlighted the following attributes while ensuring the content's credibility:

fact-checked
trusted source
proofread

Get Instant Summarized Text (GIST)

A simplified method called ProteinReDiff enhances ligand-binding protein redesign using artificial intelligence. It leverages representation learning from AlphaFold2 to improve binding affinity predictions without detailed structural data, relying only on initial protein sequences and ligand SMILES strings. ProteinReDiff outperforms other models in optimizing ligand binding affinity, offering potential for diverse biotechnological and pharmaceutical applications.

This summary was automatically generated using LLM.