Clustering Highly Divergent Homologous Proteins: An Alignment‐Free Method. Issue 2 (21st February 2023)
- Record Type:
- Journal Article
- Title:
- Clustering Highly Divergent Homologous Proteins: An Alignment‐Free Method. Issue 2 (21st February 2023)
- Main Title:
- Clustering Highly Divergent Homologous Proteins: An Alignment‐Free Method
- Authors:
- Muñoz‐Baena, Laura
Poon, Art F. Y. - Abstract:
- Abstract: The comparative analysis of amino acid sequences is an important tool in molecular biology that often requires multiple sequence alignments. In comparisons between less closely related genomes, however, it becomes more difficult to accurately align protein‐coding sequences, or even to identify homologous regions in different genomes. In this article, we describe an alignment‐free method for the classification of homologous protein‐coding regions from different genomes. This methodology was originally developed for comparing genomes within virus families, but may be adapted for other organisms. We quantify sequence homology from the overlap (intersection distance) of the k ‐mer (word) frequency distributions for different protein sequences. Next, we extract groups of homologous sequences from the resulting distance matrix using a combination of dimensionality reduction and hierarchical clustering methods. Finally, we demonstrate how to generate visualizations of the composition of clusters with respect to protein annotations, and by coloring protein‐coding regions of genomes by cluster assignments. These provide a useful means to quickly assess the reliability of the clustering results based on the distribution of homologous genes among genomes. © 2023 Wiley Periodicals LLC. Basic Protocol 1 : Data collection and processing Basic Protocol 2 : Calculating k ‐mer distances Basic Protocol 3 : Extracting clusters of homology Support Protocol : Genome plot based onAbstract: The comparative analysis of amino acid sequences is an important tool in molecular biology that often requires multiple sequence alignments. In comparisons between less closely related genomes, however, it becomes more difficult to accurately align protein‐coding sequences, or even to identify homologous regions in different genomes. In this article, we describe an alignment‐free method for the classification of homologous protein‐coding regions from different genomes. This methodology was originally developed for comparing genomes within virus families, but may be adapted for other organisms. We quantify sequence homology from the overlap (intersection distance) of the k ‐mer (word) frequency distributions for different protein sequences. Next, we extract groups of homologous sequences from the resulting distance matrix using a combination of dimensionality reduction and hierarchical clustering methods. Finally, we demonstrate how to generate visualizations of the composition of clusters with respect to protein annotations, and by coloring protein‐coding regions of genomes by cluster assignments. These provide a useful means to quickly assess the reliability of the clustering results based on the distribution of homologous genes among genomes. © 2023 Wiley Periodicals LLC. Basic Protocol 1 : Data collection and processing Basic Protocol 2 : Calculating k ‐mer distances Basic Protocol 3 : Extracting clusters of homology Support Protocol : Genome plot based on clustering results … (more)
- Is Part Of:
- Current protocols. Volume 3:Issue 2(2023)
- Journal:
- Current protocols
- Issue:
- Volume 3:Issue 2(2023)
- Issue Display:
- Volume 3, Issue 2 (2023)
- Year:
- 2023
- Volume:
- 3
- Issue:
- 2
- Issue Sort Value:
- 2023-0003-0002-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2023-02-21
- Subjects:
- alignment‐free methods -- bioinformatics -- protein clustering -- Python -- R
Life sciences -- Laboratory manuals -- Periodicals
Biology -- Laboratory manuals -- Periodicals
Life sciences -- Technique -- Periodicals
Biology -- Technique -- Periodicals
570.028 - Journal URLs:
- https://currentprotocols.onlinelibrary.wiley.com/journal/26911299 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/cpz1.666 ↗
- Languages:
- English
- ISSNs:
- 2691-1299
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26063.xml