Âé¶¹ÒùÔº


Learning the language of lasso peptides to improve peptide engineering

Learning the language of lasso peptides to improve peptide engineering
Development of a lasso peptide-specific language model, LassoESM. A Lasso peptide biosynthesis requires a leader peptidase, RiPP recognition element (RRE), and lasso cyclase to tie a linear core peptide into the lariat-like knot. B LassoESM was built upon the ESM-2 architecture and further pre-trained on lasso peptides using a domain-adaptive approach with masked language modeling. The resulting LassoESM embeddings were utilized for three downstream tasks: predicting lasso cyclase substrate tolerance, identifying substrate compatibility between non-cognate pairs of lasso cyclases and substrate peptides, and predicting RNAP inhibitory activity (numbers indicate enrichment values, estimating RNAP inhibitory activity). Credit: Nature Communications (2025). DOI: 10.1038/s41467-025-63412-3

In the hunt for new therapeutics for cancer and infectious diseases, lasso peptides prove to be a catch. Their knot-like structures afford these molecules high stability and diverse biological activities, making them a promising avenue for new therapeutics. To better unleash their clinical potential, a team from the Carl R. Woese Institute for Genomic Biology has developed LassoESM, a new large language model for predicting lasso peptide properties.

The collaborative study was recently in Nature Communications.

Lasso peptides are made by bacteria. To produce these peptides, bacteria use ribosomes to build chains of amino acids that are then folded by biosynthetic enzymes into a unique slip knot-like structure. Through this process, thousands of different lasso peptides are generated, many of which have demonstrated antibacterial, antiviral, and anticancer properties.

"There are striking opportunities to use lasso peptides in , from targeting receptors to developing stable oral therapeutics," said Doug Mitchell, the Director of the Vanderbilt Institute for Chemical Biology and co-leader of the study. "By building a dedicated language model for these molecules, we've created a tool that helps us unlock these possibilities far more efficiently."

Machine learning models have become essential tools for researchers, particularly for recognizing patterns in large data sets. This enables scientists to find new connections, while also saving months of time and effort. Protein prediction especially benefits from this technology, helping to uncover new insights into complex protein interactions and accelerate the discovery of new therapeutics. But commonly used AI platforms for protein prediction, such as AlphaFold, fall short when tasked with lasso peptides.

"Because of the unique structure of the lasso peptide, none of the current AI programs actually work in terms of doing a structure prediction," said project co-leader Diwakar Shukla (BSD/CAMBERS/MMG), a professor of chemical and biomolecular engineering and James W. Westwater Professorial Scholar at the University of Illinois Urbana-Champaign.

Similar to the powering AI chatbots, protein language models are trained to learn and apply the language of proteins: their , three-dimensional structures, and interactions with surrounding environments. But without lasso peptide specific training data, these algorithms lack specificity for these molecules.

"Predicting lasso peptide properties has been challenging due to the scarcity of experimentally labeled data and the complexity of enzyme–peptide substrate interactions," said Xuenan Mi, who recently earned her Ph.D. in Shukla's research group. "We developed LassoESM, a lasso peptide-tailored protein language model, to capture peptide-specific features that are often missed by generic protein language models."

Mitchell's group first used bioinformatics methods to find thousands of lasso peptide sequences that different microorganisms produce. To improve the quality of the data, the team also manually validated any new lasso peptide sequences they discovered.

"Then, we learned the language of those lasso peptides using masked language modeling, which is where you hide part of the peptide, and then you try to predict the other half," Shukla said. "Once you have learned the language of how the lasso structure is formed in nature, then you can train efficient property prediction models based on these language model parameters."

By combining the Shukla group's machine learning knowledge with experimental data collected by Mitchell's group, the team applied LassoESM for numerous useful prediction tasks. One area of focus is the identification of compatible lasso peptide and lasso cyclase pairs to expand the clinical potential of these molecules. Lasso cyclases are the enzymes responsible for the knot-forming step of lasso peptide biosynthesis. Just as different locks require unique keys, different peptides require specific lasso cyclases to tie the characteristic knot.

"We built the models to predict which lasso cyclase could actually form a lasso peptide using only the sequence of amino acids in a peptide. If we can understand the substrate scope or we can engineer lasso cyclases, then we can potentially make any peptide into a lasso," Shukla said. Without LassoESM, these enzyme-substrate interactions are difficult to predict, highlighting the utility of this artificial intelligence tool.

Mi said, "We demonstrated that LassoESM enables accurate prediction of various lasso peptide properties, even with limited training data. This work provides a powerful AI-driven tool to accelerate the rational design of functional lasso peptides for biomedical and industrial applications."

Moving forward, the team also aims to expand their model to accommodate new prediction capabilities, such as building tailor-made language models for other peptide natural products and engineering lasso peptides to target specific proteins.

"Thanks to access to powerful computing resources on our campus and interdisciplinary collaboration opportunities provided by the MMG theme at Carl R. Woese Institute for Genomic Biology," Shukla said. "I am grateful to Xuenan Mi and Susanna Barrett for leading the computational and experimental aspects of this study, and Professor Douglas Mitchell for providing experimental support and guidance during this investigation."

More information: Xuenan Mi et al, LassoESM a tailored language model for enhanced lasso peptide property prediction, Nature Communications (2025).

Journal information: Nature Communications

Citation: Learning the language of lasso peptides to improve peptide engineering (2025, October 16) retrieved 16 October 2025 from /news/2025-10-language-lasso-peptides-peptide.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

AI tools help uncover enzyme mechanisms for lasso peptides

0 shares

Feedback to editors