A multiple sequence alignment method with sequence vectorization. (2014)
- Record Type:
- Journal Article
- Title:
- A multiple sequence alignment method with sequence vectorization. (2014)
- Main Title:
- A multiple sequence alignment method with sequence vectorization
- Authors:
- Other Names:
- Special Editor.
- Abstract:
- Abstract : Purpose – The time complexity of most multiple sequence alignment algorithm is O(N 2 ) or O(N 3 ) ( N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large‐scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large‐scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods.Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel‐Ziv. Then, it uses k‐means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large‐scale multiple sequence, Lemk_MSA proposes a GPU‐based parallel way for distance matrix calculation.Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high‐throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time.Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignmentAbstract : Purpose – The time complexity of most multiple sequence alignment algorithm is O(N 2 ) or O(N 3 ) ( N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large‐scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large‐scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods.Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel‐Ziv. Then, it uses k‐means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large‐scale multiple sequence, Lemk_MSA proposes a GPU‐based parallel way for distance matrix calculation.Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high‐throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time.Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel‐Ziv. A GPU‐based parallel method has been designed for large‐scale distance matrix calculation. It provides a new way for multiple sequence alignment research. Acknowledgements : This project was funded by the National Natural Science Foundation of China (Nos 61174161, 61201358 and 61203176), the Natural Science Foundation of Fujian Province of China (No. 2012J01154), the specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20120121120038), the Key Research Project of Xiamen City of China (No. 3502Z20123014), the Fundamental Research Funds for the Central Universities in China (Xiamen University: Nos 2011121047, 201112G018 and 201212G005), and the Fundamental Research Fund for the university student Creative and Entrepreneurship training program in China (Xiamen University: No. XDDC201210384063). … (more)
- Is Part Of:
- Engineering computations. Volume 31(2014)Supplement
- Journal:
- Engineering computations
- Issue:
- Volume 31(2014)Supplement
- Issue Display:
- Volume 31, Issue 2014 (2014)
- Year:
- 2014
- Volume:
- 31
- Issue:
- 2014
- Issue Sort Value:
- 2014-0031-2014-0000
- Page Start:
- 283
- Page End:
- 296
- Publication Date:
- 2014
- Subjects:
- k‐means -- LemK_MSA -- Lempel‐Ziv -- Multiple sequence alignment
Computer-aided engineering -- Periodicals
Computer graphics -- Periodicals
620.00285 - Journal URLs:
- http://info.emeraldinsight.com/products/journals/journals.htm?id=ec ↗
http://www.emeraldinsight.com/journals.htm?issn=0264-4401 ↗
http://www.emeraldinsight.com/0264-4401.htm ↗
http://www.emeraldinsight.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1108/EC-01-2013-0026 ↗
- Languages:
- English
- ISSNs:
- 0264-4401
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3758.580800
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 4402.xml