RefSelect: a reference sequence selection algorithm for planted (l, d) motif search. Issue 9 (July 2016)
- Record Type:
- Journal Article
- Title:
- RefSelect: a reference sequence selection algorithm for planted (l, d) motif search. Issue 9 (July 2016)
- Main Title:
- RefSelect: a reference sequence selection algorithm for planted (l, d) motif search
- Authors:
- Yu, Qiang
Huo, Hongwei
Zhao, Ruixing
Feng, Dazheng
Vitter, Jeffrey
Huan, Jun - Abstract:
- Abstract Background The planted (l, d ) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually usek out oft input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d ) motifs in the input sequences. However, most of them simply take the firstk sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. Results In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences. Conclusions The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large dataAbstract Background The planted (l, d ) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually usek out oft input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d ) motifs in the input sequences. However, most of them simply take the firstk sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. Results In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences. Conclusions The proposed algorithm RefSelect can be used to solve the problem that many pattern-driven PMS algorithms present execution time instability. RefSelect requires a small amount of storage space and is capable of selecting reference sequences efficiently and effectively. Also, the parallel version of RefSelect is provided for handling large data sets. … (more)
- Is Part Of:
- BMC bioinformatics. Volume 17:Issue 9(2016)
- Journal:
- BMC bioinformatics
- Issue:
- Volume 17:Issue 9(2016)
- Issue Display:
- Volume 17, Issue 9 (2016)
- Year:
- 2016
- Volume:
- 17
- Issue:
- 9
- Issue Sort Value:
- 2016-0017-0009-0000
- Page Start:
- 37
- Page End:
- 51
- Publication Date:
- 2016-07
- Subjects:
- Planted (l, d) motif search -- Pattern-driven -- Reference sequences
Bioinformatics -- Periodicals
Computational biology -- Periodicals
570.285 - Journal URLs:
- http://www.biomedcentral.com/bmcbioinformatics/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=13 ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s12859-016-1130-6 ↗
- Languages:
- English
- ISSNs:
- 1471-2105
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - Digital store
British Library HMNTS - ELD Digital store - Ingest File:
- 10045.xml