GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss. (5th June 2020)
- Record Type:
- Journal Article
- Title:
- GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss. (5th June 2020)
- Main Title:
- GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss
- Authors:
- Morel, Benoit
Kozlov, Alexey M
Stamatakis, Alexandros
Szöllősi, Gergely J - Editors:
- Nielsen, Rasmus
- Abstract:
- Abstract: Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers treesAbstract: Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1, 099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020) . … (more)
- Is Part Of:
- Molecular biology and evolution. Volume 37:Number 9(2020)
- Journal:
- Molecular biology and evolution
- Issue:
- Volume 37:Number 9(2020)
- Issue Display:
- Volume 37, Issue 9 (2020)
- Year:
- 2020
- Volume:
- 37
- Issue:
- 9
- Issue Sort Value:
- 2020-0037-0009-0000
- Page Start:
- 2763
- Page End:
- 2774
- Publication Date:
- 2020-06-05
- Subjects:
- gene family tree -- reconciliation -- maximum likelihood -- gene duplication -- horizontal gene transfer
Molecular biology -- Periodicals
Molecular evolution -- Periodicals
Evolution, Molecular -- Periodicals
Molecular Biology -- Periodicals
572.8 - Journal URLs:
- http://mbe.oxfordjournals.org/ ↗
http://www.molbiolevol.org/ ↗
http://ukcatalogue.oup.com/ ↗
http://firstsearch.oclc.org ↗
http://firstsearch.oclc.org/journal=0737-7038;screen=info;ECOIP ↗ - DOI:
- 10.1093/molbev/msaa141 ↗
- Languages:
- English
- ISSNs:
- 0737-4038
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.782000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15144.xml