New multi-task deep learning framework integrates large-scale single-cell proteomics and transcriptomics data

The exponential progress in single-cell multi-omics technologies has led to the accumulation of large and diverse multi-omics datasets. However, the integration of single-cell proteomics and transcriptomics (or epigenomics) data poses a significant challenge to existing methods. Several transformer-based models, such as Geneformer, have significantly changed the paradigm of single-cell transcriptome analysis. However, these methods place significant demands on computational resources.
To address these challenges, researchers at the Wuhan Botanical Garden of the Chinese Academy of Sciences have developed a Transformer-based method, called scmFormer, to integrate large-scale single-cell proteomics and transcriptomics data using a multi-task transformer. The study titled "scmFormer Integrates Large鈥怱cale Single鈥怌ell Proteomics and Transcriptomics Data by Multi鈥怲ask Transformer" was in Advanced Science.
The researchers presented a comprehensive evaluation and made case studies of this method, the results showed that scmFormer exhibited remarkable proficiency in harmonizing large-scale single-cell omics plus proteomics datasets at both the cell type and finer-scale cell level with limited computer resources.
In addition, scmFormer possesses the ability to integrate multiple single-cell paired multimodal datasets, leading to the dual benefit of reduced high cost and improved biological insights.
Moreover, scmFormer shows an outstanding ability to eliminate technical differences between different omics modalities while preserving the underlying biological information inherent in the data, spanning both cell types and experimental conditions.
The application of scmFormer for the integration of two COVID-19 datasets with 1.48 million cells further demonstrated the distinct advantage of scmFormer for handling large datasets on regular laptops.
More information: Jing Xu et al, scmFormer Integrates Large鈥怱cale Single鈥怌ell Proteomics and Transcriptomics Data by Multi鈥怲ask Transformer, Advanced Science (2024).
Journal information: Advanced Science
Provided by Chinese Academy of Sciences