A Simulation-Based Approach to Statistical Alignment. (15th September 2018)
- Record Type:
- Journal Article
- Title:
- A Simulation-Based Approach to Statistical Alignment. (15th September 2018)
- Main Title:
- A Simulation-Based Approach to Statistical Alignment
- Authors:
- Levy Karin, Eli
Ashkenazy, Haim
Hein, Jotun
Pupko, Tal - Editors:
- Bryant, David
- Abstract:
- Abstract: Classic alignment algorithms utilize scoring functions which maximize similarity or minimize edit distances. These scoring functions account for both insertion–deletion (indel) and substitution events. In contrast, alignments based on stochastic models aim to explicitly describe the evolutionary dynamics of sequences by inferring relevant probabilistic parameters from input sequences. Despite advances in stochastic modeling during the last two decades, scoring-based methods are still dominant, partially due to slow running times of probabilistic approaches. Alignment inference using stochastic models involves estimating the probability of events, such as the insertion or deletion of a specific number of characters. In this work, we present SimBa-SAl, a simulation-based approach to statistical alignment inference, which relies on an explicit continuous time Markov model for both indels and substitutions. SimBa-SAl has several advantages. First, using simulations, it decouples the estimation of event probabilities from the inference stage, which allows the introduction of accelerations to the alignment inference procedure. Second, it is general and can accommodate various stochastic models of indel formation. Finally, it allows computing the maximum-likelihood alignment, the probability of a given pair of sequences integrated over all possible alignments, and sampling alternative alignments according to their probability. We first show that SimBa-SAl allows accurateAbstract: Classic alignment algorithms utilize scoring functions which maximize similarity or minimize edit distances. These scoring functions account for both insertion–deletion (indel) and substitution events. In contrast, alignments based on stochastic models aim to explicitly describe the evolutionary dynamics of sequences by inferring relevant probabilistic parameters from input sequences. Despite advances in stochastic modeling during the last two decades, scoring-based methods are still dominant, partially due to slow running times of probabilistic approaches. Alignment inference using stochastic models involves estimating the probability of events, such as the insertion or deletion of a specific number of characters. In this work, we present SimBa-SAl, a simulation-based approach to statistical alignment inference, which relies on an explicit continuous time Markov model for both indels and substitutions. SimBa-SAl has several advantages. First, using simulations, it decouples the estimation of event probabilities from the inference stage, which allows the introduction of accelerations to the alignment inference procedure. Second, it is general and can accommodate various stochastic models of indel formation. Finally, it allows computing the maximum-likelihood alignment, the probability of a given pair of sequences integrated over all possible alignments, and sampling alternative alignments according to their probability. We first show that SimBa-SAl allows accurate estimation of parameters of the long-indel model previously developed by Miklós et al. (2004) . We next show that SimBa-SAl is more accurate than previously developed pairwise alignment algorithms, when analyzing simulated as well as empirical data sets. Finally, we study the goodness-of-fit of the long-indel and TKF91 models. We show that although the long-indel model fits the data sets better than TKF91, there is still room for improvement concerning the realistic modeling of evolutionary sequence dynamics. … (more)
- Is Part Of:
- Systematic biology. Volume 68:Number 2(2019)
- Journal:
- Systematic biology
- Issue:
- Volume 68:Number 2(2019)
- Issue Display:
- Volume 68, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 68
- Issue:
- 2
- Issue Sort Value:
- 2019-0068-0002-0000
- Page Start:
- 252
- Page End:
- 266
- Publication Date:
- 2018-09-15
- Subjects:
- Long-indel model -- pairwise alignment -- sequence simulations -- SimBa-SAl -- statistical alignment
Biology -- Classification -- Periodicals
Biology -- Periodicals
Biologie -- Classification -- Périodiques
Biologie -- Périodiques
578.012 - Journal URLs:
- http://ukcatalogue.oup.com/ ↗
- DOI:
- 10.1093/sysbio/syy059 ↗
- Languages:
- English
- ISSNs:
- 1063-5157
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8589.180700
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11995.xml