MinimapR: A parallel alignment tool for the analysis of large-scale third-generation sequencing data. (August 2022)
- Record Type:
- Journal Article
- Title:
- MinimapR: A parallel alignment tool for the analysis of large-scale third-generation sequencing data. (August 2022)
- Main Title:
- MinimapR: A parallel alignment tool for the analysis of large-scale third-generation sequencing data
- Authors:
- Wang, Zihang
Cui, Yingbo
Peng, Shaoliang
Liao, Xiangke
Yu, Yangbo - Abstract:
- Abstract: The development of third-generation sequencing technology has brought significant changes and influences on genomics. Compared to the second-generation sequencing methods, the third-generation technologies produce around 100 times longer reads to reveal new genomic variations that complete long-term gaps in the human reference genome. However, these reads' excessive length and high error rate severely increase the amount of data and alignment cost. The traditional data analysis platform and serial sequence alignment method can not effectively deal with large-scale long read alignment. There is a critical need for a novel data analysis platform that can deliver fast alignment of large-scale sequences to solve the problem of long read alignment. High-performance computing platforms and efficient, scalable algorithms based on these platforms have significant potential to impact sequence analysis approaches. This paper presented minimapR, a multi-level parallel long-read alignment tool based on minimap2, a popular third-generation read aligner. MinimapR is developed based on the new high-performance distributed framework Ray. Ray fully integrates with the Python environment and can be easily installed with pip. MinimapR can utilize the power of multiple computing nodes, significantly accelerating alignment speeds without sacrificing sensitivity. The minimapR tool was tested on 64 nodes and demonstrated a 50 fold increase in speed with 78 % parallel efficiency. TheAbstract: The development of third-generation sequencing technology has brought significant changes and influences on genomics. Compared to the second-generation sequencing methods, the third-generation technologies produce around 100 times longer reads to reveal new genomic variations that complete long-term gaps in the human reference genome. However, these reads' excessive length and high error rate severely increase the amount of data and alignment cost. The traditional data analysis platform and serial sequence alignment method can not effectively deal with large-scale long read alignment. There is a critical need for a novel data analysis platform that can deliver fast alignment of large-scale sequences to solve the problem of long read alignment. High-performance computing platforms and efficient, scalable algorithms based on these platforms have significant potential to impact sequence analysis approaches. This paper presented minimapR, a multi-level parallel long-read alignment tool based on minimap2, a popular third-generation read aligner. MinimapR is developed based on the new high-performance distributed framework Ray. Ray fully integrates with the Python environment and can be easily installed with pip. MinimapR can utilize the power of multiple computing nodes, significantly accelerating alignment speeds without sacrificing sensitivity. The minimapR tool was tested on 64 nodes and demonstrated a 50 fold increase in speed with 78 % parallel efficiency. The source code and user manual of minimapR are freely available at https://github.com/Geehome/minimapR . … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 99(2022)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 99(2022)
- Issue Display:
- Volume 99, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 99
- Issue:
- 2022
- Issue Sort Value:
- 2022-0099-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-08
- Subjects:
- Sequence alignment -- Parallel processing -- Ray -- Minimap2
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2022.107735 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 22692.xml