A new scale of biology: Massive datasets are aiding in the fight against superbugs

Stephanie Baum
scientific editor

Robert Egan
associate editor

Artificial intelligence relies on machine learning algorithms trained on massive datasets to make predictions—think of how ChatGPT learned language by gorging on the internet. In biology, however, scientists face a frustrating challenge—the high-quality datasets needed to train powerful artificial intelligence models are rare. Without these datasets, we can't harness machine learning to tackle our most pressing health challenges.
Calin Plesa, an assistant professor of bioengineering at the University of Oregon's Phil and Penny Knight Campus for Accelerating Scientific Impact, saw an opportunity to change that. His lab specializes in scaling up synthetic biology, moving from working with individual biological components to engineering entire systems that are orders of magnitude larger. The goal? Create massive, high-quality, biological datasets that could train machine learning systems to fight cancer, accelerate drug development, and combat disease resistance.
In research in Science Advances, Plesa and his team have demonstrated exactly how this could work, taking aim at one of medicine's most urgent threats: antibiotic-resistant superbugs.
"We're essentially trying to upsize biological information," says Plesa. "Right now, if you want to understand how a gene works across different species, you might be able to test maybe ten or twenty different versions in a traditional lab setup. We're interested in now testing thousands simultaneously."
Scalable technology
The key to creating these enormous datasets is Plesa's innovative DropSynth technology, which can simultaneously manufacture thousands of genes, the DNA sequences that serve as biological instruction manuals for specific traits and characteristics. Traditionally, creating genes in the lab has been a slow, painstaking process, but DropSynth has changed that entirely.
The technology works by packaging all the molecular machinery needed to build a gene into microscopic oil droplets. Each droplet becomes a tiny biological factory that assembles and amplifies a unique gene. In each test tube there can be hundreds to thousands of these microscopic oil droplets, enabling hundreds to thousands of genes to be produced simultaneously. Remarkably, this all happens in a single test tube, dramatically accelerating the process of generating vast libraries of genes.
When Plesa first revealed DropSynth in 2018, he knew it had broad potential. But he needed collaborators to help tackle real-world biological challenges. That expertise came through Dr. Seeyan Lam Knight Campus Undergraduate Scholar, Carmen Resnick, and a postdoctoral researcher with expertise in microbial ecology, Karl Romanowicz.
Together, the Plesa lab set their sights on one of the most pressing threats in modern medicine: antimicrobial resistance. This occurs when bacteria evolve to resist antibiotics, creating "superbugs" that can evade our current arsenal of drugs and treatments. In 2021 alone, antimicrobial resistance killed an estimated 1.14 million people worldwide, according to a report in The Lancet.

A novel proof of principle
Traditional approaches to studying antimicrobial resistance are frustratingly limited. Researchers typically expose bacteria to antibiotics gradually and analyze the survivors, a process that's slow, restricted to species that grow well in laboratory conditions, and difficult to scale up.
Romanowicz, Resnick and Plesa envisioned something different. What if they could use DropSynth to create thousands of bacterial gene variants simultaneously, providing unprecedented insight into how resistance develops?
In their most recent publication, the team devised a simple, reductionist approach to study antimicrobial resistance at scale. Rather than attempting the logistical nightmare of growing thousands of different bacterial species, they focused on a single gene that serves as a common denominator in antibiotic resistance: DHFR (dihydrofolate reductase).
DHFR is an essential enzyme that bacteria need for folate production and cell division, and it's a frequent target of antibiotics. The team focused on generating DHFR genes from a wide range of bacteria (including many that aren't typically studied in laboratories) and inserted them one by one into E. coli, a species that thrives in lab conditions.
The key was using a specially engineered strain of E. coli that lacks its own DHFR gene. This made the bacteria's survival completely dependent on whether the introduced DHFR gene variant could restore typical DHFR function of folate synthesis. This clever system allowed the researchers to test hundreds of different DHFR gene variants from across the bacterial evolutionary tree, all within a consistent, controlled experimental setup.
Using DropSynth, Resnick, under the mentorship of lab manager Samuel Hinton, created over 1,500 versions of the DHFR gene, including variants from familiar pathogens like staph (Staphylococcus aureus) and cholera (Vibrio cholerae). Through incorporating such broad evolutionary diversity, they hoped to uncover how different versions of DHFR vary in their ability to resist antibiotics, and whether some share conserved vulnerabilities.
The results yielded an immediate surprise: Many DHFR variants from distant bacterial relatives functioned perfectly well in their E. coli test system.
"We were shocked how many sequences were basically functional when plugged into E. coli, and that it worked so nicely. It essentially became plug and play with all these DHFR variants," said Plesa. This finding alone revealed that DHFR is far more evolutionarily adaptable than scientists previously realized.
Next came the crucial test. The researchers exposed their library of engineered bacteria to varying concentrations of trimethoprim, an antibiotic that specifically targets DHFR. The results painted a detailed picture of resistance, some DHFR variants were highly sensitive to the drug, while others remained resistant even at high doses.
Mapping resistance patterns
Through advanced computational approaches developed by Romanowicz, the team could identify which regions of DHFR are prone to developing resistance and which remain vulnerable to antibiotics. Since their library represented such wide evolutionary diversity, Romanowicz's analysis was able to reveal how resistance is shaped by both mutation and evolutionary background. When they analyzed several known DHFR variants already linked to clinical antibiotic resistance, they found that these versions maintained their resistance for significantly longer periods.
This kind of detailed insight could revolutionize how we design next-generation antibiotics, pointing researchers toward molecular "soft spots" in bacterial genes while flagging variants that might be primed for resistance.
"As antimicrobial resistance continues to rise, our ability to study gene function across thousands of microbial species represents more than just an exciting scientific development, it could become a critical tool in our ongoing battle against superbugs," says Romanowicz, co-first author on the publication.
The Plesa lab is already working on integrating this massive DHFR dataset into machine learning algorithms that could eventually predict resistance before it emerges in clinical settings.
Beyond antibiotics: A platform for discovery
For Plesa, this project represents exactly the kind of scalable biological insight he envisioned when creating DropSynth. While this study focused specifically on DHFR and antibiotic resistance, the implications extend far beyond fighting superbugs.
"Using DropSynth, we were first able to increase the scale and create large libraries of genes and apply that to antimicrobial resistance. This was a proof of principle, but we could do this really for anything in biology," says Plesa.
The same approach could be used to study cancer-associated genes, map viral evolution, or design entirely novel enzymes and proteins. More impressively, Plesa and his team envision DropSynth as a tool to create the massive datasets needed to train high quality machine learning algorithms to help researchers tackle big problems across biology.
This technology is already growing fast for commercial use. Plesa has founded SynPlexity, a startup company based in the Knight Campus Papé Family Innovation Center. The company is working to commercialize DropSynth and share high-throughput, large scale, synthetic biology datasets more broadly.
While this study focused on a single gene, it serves as a powerful proof of concept with vast potential. From understanding evolution to designing new proteins, the combination of synthetic biology and machine learning is opening new frontiers in our fight against some of humanity's greatest health challenges.
More information: Karl J. Romanowicz et al, Exploring antibiotic resistance in diverse homologs of the dihydrofolate reductase protein family through broad mutational scanning, Science Advances (2025).
Journal information: Science Advances , The Lancet
Provided by University of Oregon