Selecting the Right Similarity‐Scoring Matrix. Issue 1 (16th February 2018)
- Record Type:
- Journal Article
- Title:
- Selecting the Right Similarity‐Scoring Matrix. Issue 1 (16th February 2018)
- Main Title:
- Selecting the Right Similarity‐Scoring Matrix
- Authors:
- Pearson, William R.
- Editors:
- Bateman, Alex
Pearson, William R.
Stein, Lincoln D.
Stormo, Gary D.
Yates, John R. - Abstract:
- Abstract: Protein sequence similarity searching programs like BLASTP, SSEARCH, and FASTA use scoring matrices that are designed to identify distant evolutionary relationships (BLOSUM62 for BLAST, BLOSUM50 for SSEARCH and FASTA). Different similarity scoring matrices are most effective at different evolutionary distances. "Deep" scoring matrices like BLOSUM62 and BLOSUM50 target alignments with 20% to 30% identity, while "shallow" scoring matrices (e.g., VTML10 to VTML80) target alignments that share 90% to 50% identity, reflecting much less evolutionary change. While "deep" matrices provide very sensitive similarity searches, they also require longer sequence alignments and can sometimes produce alignment overextension into nonhomologous regions. Shallower scoring matrices are more effective when searching for short protein domains, or when the goal is to limit the scope of the search to sequences that are likely to be orthologous between recently diverged organisms. Likewise, in DNA searches, the match and mismatch parameters set evolutionary look‐back times and domain boundaries. In this unit, we will discuss the theoretical foundations that drive practical choices of protein and DNA similarity scoring matrices and gap penalties. Deep scoring matrices (BLOSUM62 and BLOSUM50) should be used for sensitive searches with full‐length protein sequences, but short domains or restricted evolutionary look‐back require shallower scoring matrices. Curr. Protoc. Bioinform .Abstract: Protein sequence similarity searching programs like BLASTP, SSEARCH, and FASTA use scoring matrices that are designed to identify distant evolutionary relationships (BLOSUM62 for BLAST, BLOSUM50 for SSEARCH and FASTA). Different similarity scoring matrices are most effective at different evolutionary distances. "Deep" scoring matrices like BLOSUM62 and BLOSUM50 target alignments with 20% to 30% identity, while "shallow" scoring matrices (e.g., VTML10 to VTML80) target alignments that share 90% to 50% identity, reflecting much less evolutionary change. While "deep" matrices provide very sensitive similarity searches, they also require longer sequence alignments and can sometimes produce alignment overextension into nonhomologous regions. Shallower scoring matrices are more effective when searching for short protein domains, or when the goal is to limit the scope of the search to sequences that are likely to be orthologous between recently diverged organisms. Likewise, in DNA searches, the match and mismatch parameters set evolutionary look‐back times and domain boundaries. In this unit, we will discuss the theoretical foundations that drive practical choices of protein and DNA similarity scoring matrices and gap penalties. Deep scoring matrices (BLOSUM62 and BLOSUM50) should be used for sensitive searches with full‐length protein sequences, but short domains or restricted evolutionary look‐back require shallower scoring matrices. Curr. Protoc. Bioinform . 43:3.5.1‐3.5.9. © 2013 by John Wiley & Sons, Inc. … (more)
- Is Part Of:
- Current protocols in bioinformatics. Volume 43:Issue 1(2013)
- Journal:
- Current protocols in bioinformatics
- Issue:
- Volume 43:Issue 1(2013)
- Issue Display:
- Volume 43, Issue 1 (2013)
- Year:
- 2013
- Volume:
- 43
- Issue:
- 1
- Issue Sort Value:
- 2013-0043-0001-0000
- Page Start:
- 3.5.1
- Page End:
- 3.5.9
- Publication Date:
- 2018-02-16
- Subjects:
- similarity scoring matrices -- PAM matrices -- BLOSUM matrices -- sequence alignment
Bioinformatics -- Laboratory manuals
Nucleotide sequence -- Laboratory manuals
Amino acid sequence -- Laboratory manuals
Base Sequence
Amino Acid Sequence
Computational Biology -- methods
Databases, Genetic
Proteins -- analysis
Sequence Analysis -- methods
Sequence Homology
Amino acid sequence
Bioinformatics
Nucleotide sequence
Laboratory Manuals
Laboratory manuals
570.285 - Journal URLs:
- https://currentprotocols.onlinelibrary.wiley.com/journal/1934340x ↗
http://www3.interscience.wiley.com/cgi-bin/mrwhome/104554769/HOME ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/0471250953.bi0305s43 ↗
- Languages:
- English
- ISSNs:
- 1934-3396
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20878.xml