Discovering regulatory motifs of genetic networks using the indexing-tree based algorithm: a parallel implementation. Issue 1 (25th June 2020)
- Record Type:
- Journal Article
- Title:
- Discovering regulatory motifs of genetic networks using the indexing-tree based algorithm: a parallel implementation. Issue 1 (25th June 2020)
- Main Title:
- Discovering regulatory motifs of genetic networks using the indexing-tree based algorithm: a parallel implementation
- Authors:
- Almomany, Abedalmuhdi
Al-Omari, Ahmad M.
Jarrah, Amin
Tawalbeh, Mohammad - Abstract:
- Abstract : Purpose: The problem of motif discovery has become a significant challenge in the era of big data where there are hundreds of genomes requiring annotations. The importance of motifs has led many researchers to develop different tools and algorithms for finding them. The purpose of this paper is to propose a new algorithm to increase the speed and accuracy of the motif discovering process, which is the main drawback of motif discovery algorithms. Design/methodology/approach: All motifs are sorted in a tree-based indexing structure where each motif is created from a combination of nucleotides: 'A', 'C', 'T' and 'G'. The full motif can be discovered by extending the search around 4-mer nucleotides in both directions, left and right. Resultant motifs would be identical or degenerated with various lengths. Findings: The developed implementation discovers conserved string motifs in DNA without having prior information about the motifs. Even for a large data set that contains millions of nucleotides and thousands of very long sequences, the entire process is completed in a few seconds. Originality/value: Experimental results demonstrate the efficiency of the proposed implementation; as for a real-sequence of 1, 270, 000 nucleotides spread into 2, 000 samples, it takes 5.9 s to complete the overall discovering process when the code ran on an Intel Core i7-6700 @ 3.4 GHz machine and 26.7 s when running on an Intel Xeon x5670 @ 2.93 GHz machine. In addition, the authorsAbstract : Purpose: The problem of motif discovery has become a significant challenge in the era of big data where there are hundreds of genomes requiring annotations. The importance of motifs has led many researchers to develop different tools and algorithms for finding them. The purpose of this paper is to propose a new algorithm to increase the speed and accuracy of the motif discovering process, which is the main drawback of motif discovery algorithms. Design/methodology/approach: All motifs are sorted in a tree-based indexing structure where each motif is created from a combination of nucleotides: 'A', 'C', 'T' and 'G'. The full motif can be discovered by extending the search around 4-mer nucleotides in both directions, left and right. Resultant motifs would be identical or degenerated with various lengths. Findings: The developed implementation discovers conserved string motifs in DNA without having prior information about the motifs. Even for a large data set that contains millions of nucleotides and thousands of very long sequences, the entire process is completed in a few seconds. Originality/value: Experimental results demonstrate the efficiency of the proposed implementation; as for a real-sequence of 1, 270, 000 nucleotides spread into 2, 000 samples, it takes 5.9 s to complete the overall discovering process when the code ran on an Intel Core i7-6700 @ 3.4 GHz machine and 26.7 s when running on an Intel Xeon x5670 @ 2.93 GHz machine. In addition, the authors have improved computational performance by parallelizing the implementation to run on multi-core machines using the OpenMP framework. The speedup achieved by parallelizing the implementation is scalable and proportional to the number of processors with a high efficiency that is close to 100%. … (more)
- Is Part Of:
- Engineering computations. Volume 38:Issue 1(2021)
- Journal:
- Engineering computations
- Issue:
- Volume 38:Issue 1(2021)
- Issue Display:
- Volume 38, Issue 1 (2021)
- Year:
- 2021
- Volume:
- 38
- Issue:
- 1
- Issue Sort Value:
- 2021-0038-0001-0000
- Page Start:
- 354
- Page End:
- 370
- Publication Date:
- 2020-06-25
- Subjects:
- Optimization -- OpenMP -- Parallel processing -- Genetic network -- Multi-core -- Regulation motif
Computer-aided engineering -- Periodicals
Computer graphics -- Periodicals
620.00285 - Journal URLs:
- http://info.emeraldinsight.com/products/journals/journals.htm?id=ec ↗
http://www.emeraldinsight.com/journals.htm?issn=0264-4401 ↗
http://www.emeraldinsight.com/0264-4401.htm ↗
http://www.emeraldinsight.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1108/EC-02-2020-0108 ↗
- Languages:
- English
- ISSNs:
- 0264-4401
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3758.580800
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 22323.xml