Âé¶¹ÒùÔº


Recognizing rare microorganisms with an AI-based tool

Researchers create AI-based tool to recognize rare microorganisms
Quality of ulrb clustering measured by the average Silhouette score as a function of number of samples, ASVs, and sequencing depth. Credit: Communications Biology (2025). DOI: 10.1038/s42003-025-07912-4

Identifying rare microorganisms in microbiome data just got easier. A team of researchers from Portugal and Canada has developed a new tool that uses machine learning to automatically detect rare biosphere in ecological datasets.

The aim is to quickly, autonomously and unsupervisedly identify rare microorganisms in microbiome datasets. This new tool, named ulrb, responds to a long-standing challenge in : distinguishing rare microorganisms from the most abundant in natural environments.

The new methodology and the new ulrb software have now been in the study "Definition of the microbial rare biosphere through unsupervised machine learning" in the journal Communications Biology.

The paper is the result of an international collaboration between the Interdisciplinary Center for Marine and Environmental Research (CIIMAR), the Faculty of Sciences of the University of Porto, the Institute of Bioengineering and Biosciences (iBB) of the Instituto Superior Técnico of the University of Lisbon and the School of Electrical Engineering and Computer Science of the University of Ottawa (EECS) and the Faculty of Computer Science of Dalhousie University, both in Canada.

This is a product of the Ph.D. project of CIIMAR student Francisco Pascoal under the supervision of CIIMAR researcher Catarina Magalhães and the co-supervision of researchers Rodrigo Costa (iBB) and Paula Branco (EECS).

This new software will increase not only the accuracy of ecological analyses of different microbiomes and ecosystems, but also the depth at which these analyses are carried out, ultimately improving our understanding of microbial diversity and its role in ecosystem resilience.

What is the rare biosphere?

Microbial communities normally follow a pattern in which only a few species are highly abundant, while the vast majority of diversity is low in abundance and belongs to the so-called "rare biosphere." In fact, there are thousands of species of prokaryotic microorganisms that can inhabit 1 liter of seawater. However, only 2% to 5% of these species are abundant, while the rest are rare and very difficult to detect and identify due to methodological limitations.

Although they are not very abundant, rare species contain the greatest genetic diversity on the planet. They are responsible for providing great resilience to an ecosystem. "If the most abundant species are threatened by , other can take over and ensure the functions of the microbiome, keeping the ecosystem stable," explains Pascoal.

The rare biosphere therefore plays a very important role in ecosystem responses to major changes in the environment, such as the effects of climate change. Studying rare organisms allows us to understand the resilience of ecosystems to these changes and to study their reaction to environmental alterations.

The innovation of ulrb

By employing unsupervised techniques, ulrb allows researchers to quickly and reliably identify rare microorganisms in a community. A major advantage of this method is its adaptability to different methodological contexts, i.e., the algorithm "learns" the patterns present in the data itself, regardless of its origin.

"The possibility of identifying rare microorganisms arose with the development of high-throughput DNA sequencing technologies, but even with this data it was never clear among peers how to identify rare microorganisms, as they were overshadowed by the abundant ones. Thus, many researchers limited themselves to establishing random levels of abundance, which was an insufficient approach since it was not supported by biological justification.

"With this new method, we were able to use sequencing data to automatically distinguish which microorganisms are rare, based on the information provided in each sample," says Pascoal, first author of the study.

To automate the process, an algorithm was created that groups together the microorganisms that are most similar to each other in terms of their abundance in a given sample. As it is based on the relative distance between them, it can be automated and applied to databases of any size, and produces a result with rigorous and uniform ecological and biological value.

"Basically, the algorithm 'learns' what the abundance groups in a community are and matches them up with an abundance classification, which makes it possible to distinguish microorganisms that are rare from those that are abundant," says Pascoal.

Possible applications

The ulrb can be applied to data derived from common microbial ecology protocols, and could be useful for studying emerging diseases and biological invasions. Since this method can be applied to non-microbial data, it can also be useful for determining which species of animals and/or plants are at risk in certain contexts, which can be useful for environmental monitoring.

If you are a researcher and want to apply this tool to your own data, ulrb is available as an open source R package on and . The team of researchers has also created a with learning materials to encourage you to use the tool.

More information: Francisco Pascoal et al, Definition of the microbial rare biosphere through unsupervised machine learning, Communications Biology (2025).

Journal information: Communications Biology

Provided by Interdisciplinary Centre of Marine and Environmental Research

Citation: Recognizing rare microorganisms with an AI-based tool (2025, April 9) retrieved 24 May 2025 from /news/2025-04-rare-microorganisms-ai-based-tool.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Researchers find maintenance mechanism of microbial diversity in Tibet wetlands

44 shares

Feedback to editors