Âé¶¹ÒùÔº


Complex deep learning models are no better at understanding genetic perturbation than simple baseline ones, study finds

Complex deep learning models are no better at understanding biology than simple baseline models, study finds
Double perturbation prediction. Credit: Nature Methods (2025). DOI: 10.1038/s41592-025-02772-6

Deep learning models have shown great potential in predicting and engineering functional enzymes and proteins. Does this prowess extend to other fields of biology as well?

Contrary to expectations, a recent study found that deep-learning-based foundation models do not outperform simple baseline methods in predicting how genetic perturbations—alterations in gene expression or function—affect the transcriptome, the gene expression profile of cells. In the case of double perturbations, where two genes are simultaneously altered, the was higher in deep-learning models compared to baseline additive models, which, instead of using complex machine learning, simply added the combined effects of the gene changes.

Foundational models are deep learning models trained on enormous amounts of data. In this context, they refer to single-cell models trained on recently published transcriptomics data covering millions of cells.

in Nature, this study tapped into publicly available single-cell CRISPR perturbation datasets to benchmark five prominent foundation models—including scGPT and scFoundation—alongside two other , against four deliberately simple baselines.

Recent research in deep-learning-based foundation models aims to revolutionize understanding of biology by training on massive amounts of data with the expectation that the models will be able to gain a general understanding of how cells work, instead of just memorizing specific experimental outcomes. This capability would allow the prediction of outcomes without having to perform experiments, significantly accelerating and disease research.

Complex deep learning models are no better at understanding biology than simple baseline models, study finds
Single perturbation prediction. Credit: Nature Methods (2025). DOI: 10.1038/s41592-025-02772-6

However, biology is a deeply complex science, where the behavior of cells, genes and organisms depends on numerous factors—many of which remain undiscovered. The models being developed to comprehend these complexities are extremely computationally expensive as they require time, energy, and powerful machines. Before pouring further resources into building such models, it is crucial to pause and ask: Are they truly effective, and do they outperform the models we already have?

While previous studies have carried out benchmark experiments, most of them pitted one deep learning model with another and lacked comparison with a simple model. The researchers of this study set out to change this by comparing simple, interpretable baseline models with complex ones.

They discovered that none of the complex models consistently outperformed simple baselines, such as no-change, mean or linear model-based predictions, in predicting the effect of single or double perturbations on . Most models also struggled to accurately predict complex genetic interactions.

These findings made it quite evident that higher cost and complexity do not necessarily translate into better performance compared to simpler, less resource-intensive methods. It also established the importance of rigorous testing and benchmarking of newer models against existing ones.

The researchers concluded that the ambitious goal of foundation models to learn a generalizable understanding of cellular states and predict outcomes based on this knowledge is still out of reach.

Written for you by our author , edited by , and fact-checked and reviewed by —this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a (especially monthly). You'll get an ad-free account as a thank-you.

More information: Constantin Ahlmann-Eltze et al, Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines, Nature Methods (2025).

Journal information: Nature Methods , Nature

© 2025 Science X Network

Citation: Complex deep learning models are no better at understanding genetic perturbation than simple baseline ones, study finds (2025, August 15) retrieved 15 August 2025 from /news/2025-08-complex-deep-genetic-perturbation-simple.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further


12 shares

Feedback to editors