NSAMD: A new approach to discover structured contiguous substrings in sequence datasets using Next-Symbol-Array. (October 2016)

Record Type:: Journal Article
Title:: NSAMD: A new approach to discover structured contiguous substrings in sequence datasets using Next-Symbol-Array. (October 2016)
Main Title:: NSAMD: A new approach to discover structured contiguous substrings in sequence datasets using Next-Symbol-Array
Authors:: Pari, Abdolvahed
Baraani, Ahmad
Parseh, Saeed
Abstract:: Graphical abstract: Highlights: We presented a solution to extract unknown structured motifs named NSAMD. A new data structure to index the dataset has been presented. NSAMD uses much less memory than Flame (the competitive solution), about 99%. NSAMD is faster than Flame in extracting structured motifs, about 51%. But NSAMD is slower than Flame in finding simple motifs. Abstract: In many sequence data mining applications, the goal is to find frequent substrings. Some of these applications like extracting motifs in protein and DNA sequences are looking for frequently occurring approximate contiguous substrings called simple motifs. By approximate we mean that some mismatches are allowed during similarity test between substrings, and it helps to discover unknown patterns. Structured motifs in DNA sequences are frequent structured contiguous substrings which contains two or more simple motifs. There are some works that have been done to find simple motifs but these works have problems such as low scalability, high execution time, no guarantee to find all patterns, and low flexibility in adaptation to other application. The Flame is the only algorithm that can find all unknown structured patterns in a dataset and has solved most of these problems but its scalability for very large sequences is still weak. In this research a new approach named Next-Symbol-Array based Motif Discovery (NSAMD) is represented to improve scalability in extracting all unknown simple and structured … (more)
Is Part Of:: Computational biology and chemistry. Volume 64(2016)
Journal:: Computational biology and chemistry
Issue:: Volume 64(2016)
Issue Display:: Volume 64, Issue 2016 (2016)
Year:: 2016
Volume:: 64
Issue:: 2016
Issue Sort Value:: 2016-0064-2016-0000
Page Start:: 384
Page End:: 395
Publication Date:: 2016-10
Subjects:: Data mining -- Motif -- String -- Substring -- Support
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85
Journal URLs:: http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.compbiolchem.2016.09.001 ↗
Languages:: English
ISSNs:: 1476-9271
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store
Ingest File:: 840.xml