Complex deep learning models are no better at understanding genetic perturbation than simple baseline ones, study finds

August 15, 2025 report

Complex deep learning models are no better at understanding genetic perturbation than simple baseline ones, study finds

by , 麻豆淫院

edited by , reviewed by

Complex deep learning models are no better at understanding biology than simple baseline models, study finds — Double perturbation prediction. Credit: *Nature Methods* (2025). DOI: 10.1038/s41592-025-02772-6

Deep learning models have shown great potential in predicting and engineering functional enzymes and proteins. Does this prowess extend to other fields of biology as well?

Contrary to expectations, a recent study found that deep-learning-based foundation models do not outperform simple baseline methods in predicting how genetic perturbations鈥攁lterations in gene expression or function鈥攁ffect the transcriptome, the gene expression profile of cells. In the case of double perturbations, where two genes are simultaneously altered, the prediction error was higher in deep-learning models compared to baseline additive models, which, instead of using complex machine learning, simply added the combined effects of the gene changes.

Foundational models are deep learning models trained on enormous amounts of data. In this context, they refer to single-cell models trained on recently published transcriptomics data covering millions of cells.

in Nature, this study tapped into publicly available single-cell CRISPR perturbation datasets to benchmark five prominent foundation models鈥攊ncluding scGPT and scFoundation鈥攁longside two other deep learning models, against four deliberately simple baselines.

Recent research in deep-learning-based foundation models aims to revolutionize understanding of biology by training on massive amounts of data with the expectation that the models will be able to gain a general understanding of how cells work, instead of just memorizing specific experimental outcomes. This capability would allow the prediction of outcomes without having to perform experiments, significantly accelerating drug discovery and disease research.

However, biology is a deeply complex science, where the behavior of cells, genes and organisms depends on numerous factors鈥攎any of which remain undiscovered. The models being developed to comprehend these complexities are extremely computationally expensive as they require time, energy, and powerful machines. Before pouring further resources into building such models, it is crucial to pause and ask: Are they truly effective, and do they outperform the models we already have?

While previous studies have carried out benchmark experiments, most of them pitted one deep learning model with another and lacked comparison with a simple model. The researchers of this study set out to change this by comparing simple, interpretable baseline models with complex ones.

They discovered that none of the complex models consistently outperformed simple baselines, such as no-change, mean or linear model-based predictions, in predicting the effect of single or double perturbations on gene expression. Most models also struggled to accurately predict complex genetic interactions.

These findings made it quite evident that higher cost and complexity do not necessarily translate into better performance compared to simpler, less resource-intensive methods. It also established the importance of rigorous testing and benchmarking of newer models against existing ones.

The researchers concluded that the ambitious goal of foundation models to learn a generalizable understanding of cellular states and predict outcomes based on this knowledge is still out of reach.

Written for you by our author , edited by , and fact-checked and reviewed by 鈥攖his article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a (especially monthly). You'll get an ad-free account as a thank-you.

More information: Constantin Ahlmann-Eltze et al, Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines, Nature Methods (2025).

Journal information: Nature Methods , Nature

漏 2025 Science X Network

Citation: Complex deep learning models are no better at understanding genetic perturbation than simple baseline ones, study finds (2025, August 15) retrieved 8 October 2025 from /news/2025-08-complex-deep-genetic-perturbation-simple.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

麻豆淫院