ConLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads. (April 2020)
- Record Type:
- Journal Article
- Title:
- ConLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads. (April 2020)
- Main Title:
- ConLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads
- Authors:
- Chakraborty, Angana
Bandyopadhyay, Sanghamitra - Abstract:
- Abstract: Single Molecule Real-Time (SMRT) sequencing is a recent advancement of Next Gen technology developed by Pacific Bio (PacBio). It comes with an explosion of long and noisy reads demanding cutting edge research to get most out of it. To deal with the high error probability of SMRT data, a novel con textual Locality Sensitive Hashing (conLSH) based algorithm is proposed in this article, which can effectively align the noisy SMRT reads to the reference genome. Here, sequences are hashed together based not only on their closeness, but also on similarity of context. The algorithm has O ( n ρ + 1 ) space requirement, where n is the number of sequences in the corpus and ρ is a constant. The indexing time and querying time are bounded by O n ρ + 1 · ln n ln 1 P 2 and O ( n ρ ) respectively, where P 2 > 0, is a probability value. This algorithm is particularly useful for retrieving similar sequences, a widely used task in biology. The proposed conLSH based aligner is compared with rHAT, popularly used for aligning SMRT reads, and is found to comprehensively beat it in speed as well as in memory requirements. In particular, it takes approximately 24.2% less processing time, while saving about 70.3% in peak memory requirement for H.sapiens PacBio dataset.
- Is Part Of:
- Computational biology and chemistry. Volume 85(2020)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 85(2020)
- Issue Display:
- Volume 85, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 85
- Issue:
- 2020
- Issue Sort Value:
- 2020-0085-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-04
- Subjects:
- Locality Sensitive Hashing -- Sequence analysis -- Single Molecule Real-Time (SMRT) sequencing -- Sequence alignment -- PacBio dataset -- Algorithm
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2020.107206 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13616.xml