A Penalized-Likelihood Method to Estimate the Distribution of Selection Coefficients from Phylogenetic Data. Issue 1 (1st May 2014)
- Record Type:
- Journal Article
- Title:
- A Penalized-Likelihood Method to Estimate the Distribution of Selection Coefficients from Phylogenetic Data. Issue 1 (1st May 2014)
- Main Title:
- A Penalized-Likelihood Method to Estimate the Distribution of Selection Coefficients from Phylogenetic Data
- Authors:
- Tamuri, Asif U
Goldman, Nick
dos Reis, Mario - Abstract:
- Abstract: We develop a maximum penalized-likelihood (MPL) method to estimate the fitnesses of amino acids and the distribution of selection coefficients ( S = 2 Ns ) in protein-coding genes from phylogenetic data. This improves on a previous maximum-likelihood method. Various penalty functions are used to penalize extreme estimates of the fitnesses, thus correcting overfitting by the previous method. Using a combination of computer simulation and real data analysis, we evaluate the effect of the various penalties on the estimation of the fitnesses and the distribution of S . We show the new method regularizes the estimates of the fitnesses for small, relatively uninformative data sets, but it can still recover the large proportion of deleterious mutations when present in simulated data. Computer simulations indicate that as the number of taxa in the phylogeny or the level of sequence divergence increases, the distribution of S can be more accurately estimated. Furthermore, the strength of the penalty can be varied to study how informative a particular data set is about the distribution of S . We analyze three protein-coding genes (the chloroplast rubisco protein, mammal mitochondrial proteins, and an influenza virus polymerase) and show the new method recovers a large proportion of deleterious mutations in these data, even under strong penalties, confirming the distribution of S is bimodal in these real data. We recommend the use of the new MPL approach for the estimation ofAbstract: We develop a maximum penalized-likelihood (MPL) method to estimate the fitnesses of amino acids and the distribution of selection coefficients ( S = 2 Ns ) in protein-coding genes from phylogenetic data. This improves on a previous maximum-likelihood method. Various penalty functions are used to penalize extreme estimates of the fitnesses, thus correcting overfitting by the previous method. Using a combination of computer simulation and real data analysis, we evaluate the effect of the various penalties on the estimation of the fitnesses and the distribution of S . We show the new method regularizes the estimates of the fitnesses for small, relatively uninformative data sets, but it can still recover the large proportion of deleterious mutations when present in simulated data. Computer simulations indicate that as the number of taxa in the phylogeny or the level of sequence divergence increases, the distribution of S can be more accurately estimated. Furthermore, the strength of the penalty can be varied to study how informative a particular data set is about the distribution of S . We analyze three protein-coding genes (the chloroplast rubisco protein, mammal mitochondrial proteins, and an influenza virus polymerase) and show the new method recovers a large proportion of deleterious mutations in these data, even under strong penalties, confirming the distribution of S is bimodal in these real data. We recommend the use of the new MPL approach for the estimation of the distribution of S in species phylogenies of protein-coding genes. … (more)
- Is Part Of:
- Genetics. Volume 197:Issue 1(2014)
- Journal:
- Genetics
- Issue:
- Volume 197:Issue 1(2014)
- Issue Display:
- Volume 197, Issue 1 (2014)
- Year:
- 2014
- Volume:
- 197
- Issue:
- 1
- Issue Sort Value:
- 2014-0197-0001-0000
- Page Start:
- 257
- Page End:
- 271
- Publication Date:
- 2014-05-01
- Subjects:
- fitness effects -- selection coefficient -- penalized likelihood -- mitochondria -- chloroplast -- influenza
Genetics -- Periodicals
576.5 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
- DOI:
- 10.1534/genetics.114.162263 ↗
- Languages:
- English
- ISSNs:
- 0016-6731
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25208.xml