A multiple sequence alignment method with sequence vectorization. Issue 2 (25th February 2014)
- Record Type:
- Journal Article
- Title:
- A multiple sequence alignment method with sequence vectorization. Issue 2 (25th February 2014)
- Main Title:
- A multiple sequence alignment method with sequence vectorization
- Authors:
- Ji, Guoli
Zeng, Yong
Yang, Zijiang
Ye, Congting
Yao, Jingci - Editors:
- Hsieh, Wen-Hsiang
- Abstract:
- Abstract : Purpose: – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) ( N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach: – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings: – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value: – This paper proposes a novel method with sequence vectorization for multiple sequenceAbstract : Purpose: – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) ( N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach: – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings: – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value: – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research. … (more)
- Is Part Of:
- Engineering computations. Volume 31:Issue 2(2014)
- Journal:
- Engineering computations
- Issue:
- Volume 31:Issue 2(2014)
- Issue Display:
- Volume 31, Issue 2 (2014)
- Year:
- 2014
- Volume:
- 31
- Issue:
- 2
- Issue Sort Value:
- 2014-0031-0002-0000
- Page Start:
- 283
- Page End:
- 296
- Publication Date:
- 2014-02-25
- Subjects:
- k-means -- LemK_MSA -- Lempel-Ziv -- Multiple sequence alignment
Computer-aided engineering -- Periodicals
Computer graphics -- Periodicals
620.00285 - Journal URLs:
- http://info.emeraldinsight.com/products/journals/journals.htm?id=ec ↗
http://www.emeraldinsight.com/journals.htm?issn=0264-4401 ↗
http://www.emeraldinsight.com/0264-4401.htm ↗
http://www.emeraldinsight.com/ ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1108/EC-01-2013-0026 ↗
- Languages:
- English
- ISSNs:
- 0264-4401
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3758.580800
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 8264.xml