Distinguishing among complex evolutionary models using unphased whole‐genome data through random forest approximate Bayesian computation. (25th October 2020)
- Record Type:
- Journal Article
- Title:
- Distinguishing among complex evolutionary models using unphased whole‐genome data through random forest approximate Bayesian computation. (25th October 2020)
- Main Title:
- Distinguishing among complex evolutionary models using unphased whole‐genome data through random forest approximate Bayesian computation
- Authors:
- Ghirotto, Silvia
Vizzari, Maria Teresa
Tassi, Francesca
Barbujani, Guido
Benazzo, Andrea - Other Names:
- Fountain‐Jones Nicholas M. guestEditor.
Smith Megan L. guestEditor.
Austerlitz Frédéric guestEditor. - Abstract:
- Abstract: Inferring past demographic histories is crucial in population genetics, and the amount of complete genomes now available should in principle facilitate this inference. In practice, however, the available inferential methods suffer from severe limitations. Although hundreds complete genomes can be simultaneously analysed, complex demographic processes can easily exceed computational constraints, and the procedures to evaluate the reliability of the estimates contribute to increase the computational effort. Here we present an approximate Bayesian computation framework based on the random forest algorithm (ABC‐RF), to infer complex past population processes using complete genomes. To this aim, we propose to summarize the data by the full genomic distribution of the four mutually exclusive categories of segregating sites ( FDSS ), a statistic fast to compute from unphased genome data and that does not require the ancestral state of alleles to be known. We constructed an efficient ABC pipeline and tested how accurately it allows one to recognize the true model among models of increasing complexity, using simulated data and taking into account different sampling strategies in terms of number of individuals analysed, number and size of the genetic loci considered. We also compared the FDSS with the unfolded and folded site frequency spectrum ( SFS ), and for these statistics we highlighted the experimental conditions maximizing the inferential power of the ABC‐RFAbstract: Inferring past demographic histories is crucial in population genetics, and the amount of complete genomes now available should in principle facilitate this inference. In practice, however, the available inferential methods suffer from severe limitations. Although hundreds complete genomes can be simultaneously analysed, complex demographic processes can easily exceed computational constraints, and the procedures to evaluate the reliability of the estimates contribute to increase the computational effort. Here we present an approximate Bayesian computation framework based on the random forest algorithm (ABC‐RF), to infer complex past population processes using complete genomes. To this aim, we propose to summarize the data by the full genomic distribution of the four mutually exclusive categories of segregating sites ( FDSS ), a statistic fast to compute from unphased genome data and that does not require the ancestral state of alleles to be known. We constructed an efficient ABC pipeline and tested how accurately it allows one to recognize the true model among models of increasing complexity, using simulated data and taking into account different sampling strategies in terms of number of individuals analysed, number and size of the genetic loci considered. We also compared the FDSS with the unfolded and folded site frequency spectrum ( SFS ), and for these statistics we highlighted the experimental conditions maximizing the inferential power of the ABC‐RF procedure. We finally analysed real data sets, testing models on the dispersal of anatomically modern humans out of Africa and exploring the evolutionary relationships of the three species of Orangutan inhabiting Borneo and Sumatra. … (more)
- Is Part Of:
- Molecular ecology resources. Volume 21:Number 8(2021)
- Journal:
- Molecular ecology resources
- Issue:
- Volume 21:Number 8(2021)
- Issue Display:
- Volume 21, Issue 8 (2021)
- Year:
- 2021
- Volume:
- 21
- Issue:
- 8
- Issue Sort Value:
- 2021-0021-0008-0000
- Page Start:
- 2614
- Page End:
- 2628
- Publication Date:
- 2020-10-25
- Subjects:
- Molecular ecology -- Periodicals
572.8 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1755-0998 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/1755-0998.13263 ↗
- Languages:
- English
- ISSNs:
- 1755-098X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.817368
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 20036.xml