Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers. (October 2022)
- Record Type:
- Journal Article
- Title:
- Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers. (October 2022)
- Main Title:
- Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers
- Authors:
- Pipoli, Vittorio
Cappelli, Mattia
Palladini, Alessandro
Peluso, Carlo
Lovino, Marta
Ficarra, Elisa - Abstract:
- Highlights: Predicting gene expression levels is crucial due to its clinical applications. Post-transcriptional processes are essential in understanding the gene expression regulatory mechanisms. Previous models do not include post-transcriptional information. We present Transformer DeepLncLoc to predict gene expression levels from DNA sequences and transcription factors. Transformer DeepLncLoc reached 0.76 of the R 2 evaluation metric, outperforming existing methods. Transcription factor post-transcriptional regulation resulted in a massive performance boost. Graphical abstract: Abstract: Background and objectives: In the latest years, the prediction of gene expression levels has been crucial due to its potential applications in the clinics. In this context, Xpresso and others methods based on Convolutional Neural Networks and Transformers were firstly proposed to this aim. However, all these methods embed data with a standard one-hot encoding algorithm, resulting in impressively sparse matrices. In addition, post-transcriptional regulation processes, which are of uttermost importance in the gene expression process, are not considered in the model. Methods: This paper presents Transformer DeepLncLoc, a novel method to predict the abundance of the mRNA (i.e., gene expression levels) by processing gene promoter sequences, managing the problem as a regression task. The model exploits a transformer-based architecture, introducing the DeepLncLoc method to perform the dataHighlights: Predicting gene expression levels is crucial due to its clinical applications. Post-transcriptional processes are essential in understanding the gene expression regulatory mechanisms. Previous models do not include post-transcriptional information. We present Transformer DeepLncLoc to predict gene expression levels from DNA sequences and transcription factors. Transformer DeepLncLoc reached 0.76 of the R 2 evaluation metric, outperforming existing methods. Transcription factor post-transcriptional regulation resulted in a massive performance boost. Graphical abstract: Abstract: Background and objectives: In the latest years, the prediction of gene expression levels has been crucial due to its potential applications in the clinics. In this context, Xpresso and others methods based on Convolutional Neural Networks and Transformers were firstly proposed to this aim. However, all these methods embed data with a standard one-hot encoding algorithm, resulting in impressively sparse matrices. In addition, post-transcriptional regulation processes, which are of uttermost importance in the gene expression process, are not considered in the model. Methods: This paper presents Transformer DeepLncLoc, a novel method to predict the abundance of the mRNA (i.e., gene expression levels) by processing gene promoter sequences, managing the problem as a regression task. The model exploits a transformer-based architecture, introducing the DeepLncLoc method to perform the data embedding. Since DeepLncloc is based on word2vec algorithm, it avoids the sparse matrices problem. Results: Post-transcriptional information related to mRNA stability and transcription factors is included in the model, leading to significantly improved performances compared to the state-of-the-art works. Transformer DeepLncLoc reached 0.76 of R 2 evaluation metric compared to 0.74 of Xpresso. Conclusion: The Multi-Headed Attention mechanisms which characterizes the transformer methodology is suitable for modeling the interactions between DNA's locations, overcoming the recurrent models. Finally, the integration of the transcription factors data in the pipeline leads to impressive gains in predictive power. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 225(2022)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 225(2022)
- Issue Display:
- Volume 225, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 225
- Issue:
- 2022
- Issue Sort Value:
- 2022-0225-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-10
- Subjects:
- Attention -- DNA -- Gene-expression -- Prediction -- Transcription-factors -- Transformers
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2022.107035 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24039.xml