Causes and analytical impacts of missing data in RADseq phylogenetics: Insights from an African frog (Afrixalus). (21st January 2019)
- Record Type:
- Journal Article
- Title:
- Causes and analytical impacts of missing data in RADseq phylogenetics: Insights from an African frog (Afrixalus). (21st January 2019)
- Main Title:
- Causes and analytical impacts of missing data in RADseq phylogenetics: Insights from an African frog (Afrixalus)
- Authors:
- Crotti, Marco
Barratt, Christopher D.
Loader, Simon P.
Gower, David J.
Streicher, Jeffrey W. - Abstract:
- Abstract: Restriction site‐associated DNA sequencing (RADseq) has emerged as a useful tool in systematics and population genomics. A common feature of RADseq data sets is that they contain missing data that arise from multiple sources including genealogical sampling bias, assembly methodology and sequencing error. Many RADseq studies have demonstrated that allowing sites (single nucleotide polymorphisms, SNPs) with missing data can increase support for phylogenetic hypotheses. Two non‐mutually exclusive explanations for this observation are that (a) larger data sets contain more phylogenetic information; and (b) excluding missing data disproportionally removes sites with the highest mutation rates, causing the exclusion of characters that are likely variable and informative. Using a RADseq data set derived from the East African banana frog, Afrixalus fornasini (up to 1.1 million SNPs), we found that missing data thresholds were positively correlated with the proportion of parsimony‐informative sites and mean branch support. Using three proxies for estimating site‐specific rate, we found that the most conservative missing data strategies excluded rapidly evolving sites, with four‐state sites present only when allowing ≥60% missing data per SNP. Topological similarity among estimated phylogenies was highest for the data sets with ≥60% missing data per SNP. Our results suggest that several desirable phylogenetic qualities were observed when allowing ≥60% missing data per SNP.Abstract: Restriction site‐associated DNA sequencing (RADseq) has emerged as a useful tool in systematics and population genomics. A common feature of RADseq data sets is that they contain missing data that arise from multiple sources including genealogical sampling bias, assembly methodology and sequencing error. Many RADseq studies have demonstrated that allowing sites (single nucleotide polymorphisms, SNPs) with missing data can increase support for phylogenetic hypotheses. Two non‐mutually exclusive explanations for this observation are that (a) larger data sets contain more phylogenetic information; and (b) excluding missing data disproportionally removes sites with the highest mutation rates, causing the exclusion of characters that are likely variable and informative. Using a RADseq data set derived from the East African banana frog, Afrixalus fornasini (up to 1.1 million SNPs), we found that missing data thresholds were positively correlated with the proportion of parsimony‐informative sites and mean branch support. Using three proxies for estimating site‐specific rate, we found that the most conservative missing data strategies excluded rapidly evolving sites, with four‐state sites present only when allowing ≥60% missing data per SNP. Topological similarity among estimated phylogenies was highest for the data sets with ≥60% missing data per SNP. Our results suggest that several desirable phylogenetic qualities were observed when allowing ≥60% missing data per SNP. However, at the highest missing data thresholds (80% and 90% missing data per SNP), we observed differences in performance between high‐ and mixed‐weight DNA extraction samples, which may indicate there are trade‐offs to consider when using degraded genomic template with RADseq protocols. … (more)
- Is Part Of:
- Zoologica scripta. Volume 48:Number 2(2019)
- Journal:
- Zoologica scripta
- Issue:
- Volume 48:Number 2(2019)
- Issue Display:
- Volume 48, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 48
- Issue:
- 2
- Issue Sort Value:
- 2019-0048-0002-0000
- Page Start:
- 157
- Page End:
- 167
- Publication Date:
- 2019-01-21
- Subjects:
- Zoology -- Periodicals
590.5 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1463-6409 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/zsc.12335 ↗
- Languages:
- English
- ISSNs:
- 0300-3256
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 9519.300000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9591.xml