Âé¶¹ÒùÔº


Chemistry LLM developed for faster drug discovery

SwRI develops chemistry LLM called GAMES for faster drug discovery
SwRI developed a large language model called Generative Approaches for Molecular Encodings (GAMES) to generate Simplified Molecular Input Line Entry System (SMILES) strings, which offer a text-based system to represent the structure of chemical molecules. Credit: Southwest Research Institute

Southwest Research Institute scientists and engineers have developed a custom large language model (LLM) to accelerate drug design and discovery.

A multidisciplinary team developed the Generative Approaches for Molecular Encodings (GAMES) LLM to generate Simplified Molecular Input Line Entry System (SMILES) strings. SMILES is an industry standard system that represents the structure of molecules using a short series of text characters to facilitate storage, retrieval and modeling. Researchers trained GAMES to understand and generate valid new SMILES combinations.

"This project demonstrates a systematic way to build databases and networks of molecules for AI processing and comparison using only language," said Institute Scientist Dr. Jonathan Bohmann, lead developer of SwRI's Rhodium molecular docking software designed to virtually screen drug compounds.

Rhodium software uses descriptors along with graphical processing to visualize the chemical properties of compounds. Incorporating GAMES into the Rhodium workflow offers a faster generalized approach to drug discovery and design.

"Using LLMs, we can directly apply and AI to molecules via SMILES strings, because they appear as readable text characters and don't require translation into abstract representations," Bohmann said.

SwRI trained the GAMES model with classes of carbon-based molecules and other reference compounds to validate and fine-tune the SMILES strings it generated.

SwRI develops chemistry LLM called GAMES for faster drug discovery
Research Scientist Daniel Hinojosa, Lead Computer Scientist Michael Hartnett and Staff Scientist Dr. Jonathan Bohmann hold up a visual representation of a common molecule used for the synthesis of pharmaceuticals. The Simplified Molecular Input Line Entry System (SMILES) strings, projected onto the conference walls, correspond to the 3D molecular model. Credit: Southwest Research Institute

"This project showcases the power of training LLMs in highly technical scientific domains to focus on specific tasks," said SwRI Lead Computer Scientist Michael Hartnett. "In this case, we are working in the drug discovery domain, and our fine-tuning is focused on unlocking the most relevant knowledge."

GAMES combines LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) techniques to efficiently fine tune LLMs, reducing the hardware and energy needed to run Rhodium models. The team hopes to apply this approach to other applications and domains across the Institute.

"Using LLMs to generate accurate SMILES could transform the process, especially when trained using specific datasets," said SwRI Research Scientist Daniel Hinojosa. "The fine-tuned techniques significantly improved performance, increasing the number of valid SMILES while reducing invalid outputs. Structured datasets and specific training techniques were key to this accomplishment."

Researchers hope GAMES will offer a powerful framework for ranking compounds found in chemical libraries based on drug-likeness, a shorthand term for a combination of properties that make it most likely to be approved as a safe drug. Additionally, they plan to explore chemical landscapes systematically through testing. Hinojosa and Bohmann plan to pursue additional internal funding to advance the next phase of the project.

"While we're in early stages of development, the results are already having a direct impact on ongoing research programs at SwRI," Bohmann said.

Citation: Chemistry LLM developed for faster drug discovery (2025, August 14) retrieved 18 August 2025 from /news/2025-08-chemistry-llm-faster-drug-discovery.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further


0 shares

Feedback to editors