A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics. Issue 23 (31st October 2021)
- Record Type:
- Journal Article
- Title:
- A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics. Issue 23 (31st October 2021)
- Main Title:
- A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
- Authors:
- G. Ribeiro, Pedro
Torres Jiménez, María Fernanda
Andermann, Tobias
Antonelli, Alexandre
Bacon, Christine D.
Matos‐Maraví, Pável - Other Names:
- Jensen Evelyn L. guestEditor.
Taylor Rebecca S. guestEditor.
Coltman David W. guestEditor.
Foote Andrew D. guestEditor.
Lamichhaney Sangeet guestEditor. - Abstract:
- Abstract: The increasing availability of short‐read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short‐read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low‐coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein‐coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of k mer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected speciesAbstract: The increasing availability of short‐read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short‐read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low‐coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein‐coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of k mer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy. … (more)
- Is Part Of:
- Molecular ecology. Volume 30:Issue 23(2021)
- Journal:
- Molecular ecology
- Issue:
- Volume 30:Issue 23(2021)
- Issue Display:
- Volume 30, Issue 23 (2021)
- Year:
- 2021
- Volume:
- 30
- Issue:
- 23
- Issue Sort Value:
- 2021-0030-0023-0000
- Page Start:
- 6021
- Page End:
- 6035
- Publication Date:
- 2021-10-31
- Subjects:
- de novo assembly -- loci extraction -- low‐coverage whole genome sequencing -- secapr -- target sequence capture
Molecular ecology -- Periodicals
Molecular population biology -- Periodicals
576 - Journal URLs:
- http://www.blackwell-synergy.com/servlet/useragent?func=showIssues&code=mec&close=1999#C1999 ↗
http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1365-294X ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/mec.16240 ↗
- Languages:
- English
- ISSNs:
- 0962-1083
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.817360
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24451.xml