RHAT: fast alignment of noisy long reads with regional hashing. (14th November 2015)
- Record Type:
- Journal Article
- Title:
- RHAT: fast alignment of noisy long reads with regional hashing. (14th November 2015)
- Main Title:
- RHAT: fast alignment of noisy long reads with regional hashing
- Authors:
- Liu, Bo
Guan, Dengfeng
Teng, Mingxiang
Wang, Yadong - Abstract:
- Abstract : Motivation: Single Molecule Real-Time (SMRT) sequencing has been widely applied in cutting-edge genomic studies. However, it is still an expensive task to align the noisy long SMRT reads to reference genome by state-of-the-art aligners, which is becoming a bottleneck in applications with SMRT sequencing. Novel approach is on demand for improving the efficiency and effectiveness of SMRT read alignment. Results: We propose Regional Hashing-based Alignment Tool (rHAT), a seed-and-extension-based read alignment approach specifically designed for noisy long reads. rHAT indexes reference genome by regional hash table (RHT), a hash table-based index which describes the short tokens within local windows of reference genome. In the seeding phase, rHAT utilizes RHT for efficiently calculating the occurrences of short token matches between partial read and local genomic windows to find highly possible candidate sites. In the extension phase, a sparse dynamic programming-based heuristic approach is used for reducing the cost of aligning read to the candidate sites. By benchmarking on the real and simulated datasets from various prokaryote and eukaryote genomes, we demonstrated that rHAT can effectively align SMRT reads with outstanding throughput. Availability and implementation: rHAT is implemented in C++; the source code is available at https://github.com/HIT-Bioinformatics/rHAT . Contact: ydwang@hit.edu.cn Supplementary information: Supplementary data are available atAbstract : Motivation: Single Molecule Real-Time (SMRT) sequencing has been widely applied in cutting-edge genomic studies. However, it is still an expensive task to align the noisy long SMRT reads to reference genome by state-of-the-art aligners, which is becoming a bottleneck in applications with SMRT sequencing. Novel approach is on demand for improving the efficiency and effectiveness of SMRT read alignment. Results: We propose Regional Hashing-based Alignment Tool (rHAT), a seed-and-extension-based read alignment approach specifically designed for noisy long reads. rHAT indexes reference genome by regional hash table (RHT), a hash table-based index which describes the short tokens within local windows of reference genome. In the seeding phase, rHAT utilizes RHT for efficiently calculating the occurrences of short token matches between partial read and local genomic windows to find highly possible candidate sites. In the extension phase, a sparse dynamic programming-based heuristic approach is used for reducing the cost of aligning read to the candidate sites. By benchmarking on the real and simulated datasets from various prokaryote and eukaryote genomes, we demonstrated that rHAT can effectively align SMRT reads with outstanding throughput. Availability and implementation: rHAT is implemented in C++; the source code is available at https://github.com/HIT-Bioinformatics/rHAT . Contact: ydwang@hit.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. … (more)
- Is Part Of:
- Bioinformatics. Volume 32:Number 11(2016)
- Journal:
- Bioinformatics
- Issue:
- Volume 32:Number 11(2016)
- Issue Display:
- Volume 32, Issue 11 (2016)
- Year:
- 2016
- Volume:
- 32
- Issue:
- 11
- Issue Sort Value:
- 2016-0032-0011-0000
- Page Start:
- 1625
- Page End:
- 1631
- Publication Date:
- 2015-11-14
- Subjects:
- Bioinformatics -- Periodicals
Genomics -- Data processing -- Periodicals
Computational biology -- Periodicals
572.80285 - Journal URLs:
- http://bioinformatics.oxfordjournals.org ↗
http://firstsearch.oclc.org ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/bioinformatics/btv662 ↗
- Languages:
- English
- ISSNs:
- 1367-4803
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2072.348000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12734.xml