New algorithm for functional protein design outperforms traditional methods

Researchers from the University of Science and Technology of China (USTC), led by Prof. Liu Qi, in collaboration with Harvard Medical School's Marinka Zitnik lab, have developed a novel deep generative algorithm, PocketGen. This algorithm, based on graph representation learning and protein language models, efficiently generates protein pocket sequences and spatial structures for binding small molecules. The study was in Nature Machine Intelligence.
Functional protein design, particularly for proteins binding to small molecules such as enzymes and biosensors, is crucial for drug discovery and biomedical applications. Traditional methods based on energy optimization and template matching are time-consuming and yield low success rates.
Meanwhile, deep learning models face challenges in modeling complex molecular–protein interactions and capturing sequence-structure dependencies. PocketGen addresses these issues, offering a high-efficiency and high-accuracy solution that adheres to physicochemical principles.
PocketGen builds on previous works FAIR and PocketFlow and consists of two core components. First is a dual-layer graph Transformer encoder inspired by proteins' hierarchical structures. This module is designed to learn different fine-grained interaction information and to update the representations and spatial coordinates of amino acids and atoms accordingly.
The second part is a pre-trained protein language model, as illustrated in the image above, where PocketGen efficiently fine-tunes the ESM2 model to assist in amino acid sequence prediction. By selectively adapting certain parameters, PocketGen enhances sequence-structure consistency through cross-attention mechanisms.
Experimental results demonstrated that PocketGen significantly outperforms traditional methods in affinity, structural plausibility, and computational efficiency, achieving over a 10-fold improvement in speed. Further, in validation tasks such as protein pocket design for small molecules like fentanyl and ibuprofen, the effectiveness of PocketGen was confirmed through comparisons with state-of-the-art generative models, including RFDiffusion and RFDiffusionAA, developed by Nobel Laureate David Baker's lab.
Additionally, the attention matrices generated by PocketGen were compared with results from first-principle-based force field simulations, demonstrating that the deep learning-based PocketGen model exhibits good interpretability.
This work advances the application of deep generative models in functional protein design, laying a foundation for further biological experimentation and providing valuable insights into protein design principles. It also highlights the potential of AI to address critical challenges in drug discovery and bioengineering.
More information: Zaixi Zhang et al, Efficient generation of protein pockets with PocketGen, Nature Machine Intelligence (2024).
Journal information: Nature Machine Intelligence
Provided by University of Science and Technology of China