A new algorithm for "the LCS problem" with application in compressing genome resequencing data. (August 2016)
- Record Type:
- Journal Article
- Title:
- A new algorithm for "the LCS problem" with application in compressing genome resequencing data. (August 2016)
- Main Title:
- A new algorithm for "the LCS problem" with application in compressing genome resequencing data
- Authors:
- Beal, Richard
Afrin, Tazin
Farheen, Aliya
Adjeroh, Donald - Abstract:
- Abstract Background The longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data. Methods First, we present a new algorithm for the LCS problem. Using the generalized suffix tree, we identify the common substrings shared between the two input sequences. Using the maximal common substrings, we construct a directed acyclic graph (DAG), based on which we determine the LCS as the longest path in the DAG. Then, we introduce an LCS-motivated reference-based compression scheme using the components of the LCS, rather than the LCS itself. Results Our basic scheme compressed theHomo sapiens genome (with an original size of 3, 080, 436, 051 bytes) to 15, 460, 478 bytes. An improvement on the basic method further reduced this to 8, 556, 708 bytes, or an overall compression ratio of 360. This can be compared to the previous state-of-the-art compression ratios of 157 (Wang and Zhang, 2011) and 171 (Pinho, Pratas, and Garcia, 2011). Conclusion We propose a new algorithm to address the longest common subsequence problem. Motivated by our LCS algorithm, we introduce a new reference-based compression scheme for genome resequencing data. Comparative results against state-of-the-art reference-based compression algorithms demonstrate the performance of the proposed method.
- Is Part Of:
- BMC genomics. Volume 17:Number 4(2016)
- Journal:
- BMC genomics
- Issue:
- Volume 17:Number 4(2016)
- Issue Display:
- Volume 17, Issue 4 (2016)
- Year:
- 2016
- Volume:
- 17
- Issue:
- 4
- Issue Sort Value:
- 2016-0017-0004-0000
- Page Start:
- 369
- Page End:
- 381
- Publication Date:
- 2016-08
- Subjects:
- Longest common subsequence -- LCS -- Longest previous factor -- LPF -- Compression -- Biology -- Genome resequencing
Genomes -- Periodicals
Gene mapping -- Periodicals
Genomics -- Periodicals
Base Sequence -- Periodicals
Chromosome Mapping -- Periodicals
Genetic Techniques -- Periodicals
Sequence Analysis, DNA -- Periodicals
572.8605 - Journal URLs:
- http://www.biomedcentral.com/bmcgenomics/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=32 ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s12864-016-2793-0 ↗
- Languages:
- English
- ISSNs:
- 1471-2164
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 10042.xml