Sequence alignment algorithm enables rapid search of world's microbial DNA

Lisa Lock
scientific editor

Robert Egan
associate editor

By making the world's microbial DNA easier to explore, a new sequence alignment tool, LexicMap, lets scientists search for a DNA sequence against millions of bacterial and archaeal genomes in minutes, helping researchers track outbreaks, study antibiotic resistance, and understand microbial diversity.
Open-access databases such as the European Nucleotide Archive () contain more than 2.4 million bacterial genomes, and this number continues to grow rapidly. Until now, searching these vast resources has been slow and computationally demanding, limiting scientists' ability to track antibiotic resistance, study outbreaks, or explore microbial diversity.
A new paper, in the journal Nature Biotechnology, introduces the new algorithm. By using an innovative method to index genetic data, LexicMap enables researchers to quickly search for DNA sequences or mutations across the world's growing DNA databases. This opens up new opportunities in epidemiology, ecology, and evolutionary biology.
"Evolution gradually changes genes through mutation, so biologists often want to scan through all the world's DNA data to look for matches and how they differ through mutations," said Zamin Iqbal, professor of algorithmic and microbial genomics at the University of Bath and visiting Group Leader at EMBL-EBI. "As the data explosion has outstripped our algorithms, we have had to live with search engines that search a fraction of our data."
Breaking the scalability barrier
Over the last decade, the team behind LexicMap have been developing high-quality data resources for the use of the research community and, in parallel, developing improved search algorithms for microbial DNA. They also work as part of a global consortium——to assemble and annotate all 2.4 million bacterial and archaeal genomes in the ENA. LexicMap is the first alignment algorithm which can search all these data rapidly, and with a low computational burden.
"Google search is a routine part of modern life, and we cannot imagine dealing with the internet without it," said Wei Shen, Associate Professor at Chongqing Medical University and former visiting scientist at EMBL-EBI. "Alignment to a DNA database is the biology equivalent of Google search, and LexicMap now makes that scalable to the full volume of global bacterial data. If you have found a new drug resistance gene, you might want to know how prevalent it is among bacteria, and now you can search through the world's data for it in just a few minutes."
Tracking microbial threats
By making microbial genomes easier to search, LexicMap opens up new possibilities for research and public health.
"Having the ability to search all publicly available bacterial genomes in minutes changes what's possible," said John Lees, Group Leader at EMBL-EBI. "If you're developing a new antibiotic and discover a resistance mutation, you need to know how common it is in the real world. Now, for the first time, you can search over 2 million genomes—the entire global collection—in minutes to find out."
The LexicMap tool has already been integrated into the AllTheBacteria project, which curates and indexes high-quality assemblies of all known bacterial genomes. This gives researchers an easy way to explore one of the largest collections of microbial DNA ever assembled.
More information: Wei Shen et al, Efficient sequence alignment against millions of prokaryotic genomes with LexicMap, Nature Biotechnology (2025).
Journal information: Nature Biotechnology
Provided by European Molecular Biology Laboratory