HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing. (January 2016)
- Record Type:
- Journal Article
- Title:
- HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing. (January 2016)
- Main Title:
- HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing
- Authors:
- Karimi, Ramin
Hajdu, Andras - Abstract:
- Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k -mer generator GkmerG (genome k -mers generator). Using this pipeline, we determine the frequency of k -mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve theComprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k -mer generator GkmerG (genome k -mers generator). Using this pipeline, we determine the frequency of k -mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. … (more)
- Is Part Of:
- Evolutionary bioinformatics online. Volume 12(2016)
- Journal:
- Evolutionary bioinformatics online
- Issue:
- Volume 12(2016)
- Issue Display:
- Volume 12, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 12
- Issue:
- 2016
- Issue Sort Value:
- 2016-0012-2016-0000
- Page Start:
- Page End:
- Publication Date:
- 2016-01
- Subjects:
- DNA signature -- k-mers -- Hadoop -- WordCount -- MapReduce -- Hive
Bioinformatics -- Periodicals
Evolutionary computation -- Periodicals
Genetic programming (Computer science) -- Periodicals
Computational Biology
Evolution, Molecular
Bioinformatics
Electronic journals
Periodicals
Fulltext
Internet Resources
Periodicals
Periodicals
576.8 - Journal URLs:
- http://insights.sagepub.com/journal-evolutionary-bioinformatics-j17 ↗
http://www.uk.sagepub.com/home.nav ↗
http://www.la-press.com/evolutionary-bioinformatics-journal-j17 ↗
http://bibpurl.oclc.org/web/38943 ↗ - DOI:
- 10.4137/EBO.S35545 ↗
- Languages:
- English
- ISSNs:
- 1176-9343
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10677.xml