Using taxonomic consistency with semi‐automated data pre‐processing for high quality DNA barcodes. Issue 12 (29th June 2017)
- Record Type:
- Journal Article
- Title:
- Using taxonomic consistency with semi‐automated data pre‐processing for high quality DNA barcodes. Issue 12 (29th June 2017)
- Main Title:
- Using taxonomic consistency with semi‐automated data pre‐processing for high quality DNA barcodes
- Authors:
- Rulik, Björn
Eberle, Jonas
von der Mark, Laura
Thormann, Jana
Jung, Manfred
Köhler, Frank
Apfel, Wolfgang
Weigel, Andreas
Kopetz, Andreas
Köhler, Jonas
Fritzlar, Frank
Hartmann, Matthias
Hadulla, Karl
Schmidt, Joachim
Hörren, Thomas
Krebs, Detlef
Theves, Florian
Eulitz, Ute
Skale, André
Rohwedder, Dirk
Kleeberg, Andreas
Astrin, Jonas J.
Geiger, Matthias F.
Wägele, J. Wolfgang
Grobe, Peter
Ahrens, Dirk - Editors:
- Yu, Douglas
- Abstract:
- Abstract: In recent years, large‐scale DNA barcoding campaigns have generated an enormous amount of COI barcodes, which are usually stored in NCBI's GenBank and the official Barcode of Life database (BOLD). BOLD data are generally associated with more detailed and better curated meta‐data, because a great proportion is based on expert‐verified and vouchered material, accessible in public collections. In the course of the initiative German Barcode of Life data were generated for the reference library of 2, 846 species of Coleoptera from 13, 516 individuals. Confronted with the high effort associated with the identification, verification and data validation, a bioinformatic pipeline, "TaxCI" was developed that (1) identifies taxonomic inconsistencies in a given tree topology (optionally including a reference dataset), (2) discriminates between different cases of incongruence in order to identify contamination or misidentified specimens, (3) graphically marks those cases in the tree, which finally can be checked again and, if needed, corrected or removed from the dataset. For this, "TaxCI" may use DNA‐based species delimitations from other approaches (e.g. mPTP) or may perform implemented threshold‐based clustering. The data‐processing pipeline was tested on a newly generated set of barcodes, using the available BOLD records as a reference. A data revision based on the first run of the TaxCI tool resulted in the second TaxCI analysis in a taxonomic match ratio very similar toAbstract: In recent years, large‐scale DNA barcoding campaigns have generated an enormous amount of COI barcodes, which are usually stored in NCBI's GenBank and the official Barcode of Life database (BOLD). BOLD data are generally associated with more detailed and better curated meta‐data, because a great proportion is based on expert‐verified and vouchered material, accessible in public collections. In the course of the initiative German Barcode of Life data were generated for the reference library of 2, 846 species of Coleoptera from 13, 516 individuals. Confronted with the high effort associated with the identification, verification and data validation, a bioinformatic pipeline, "TaxCI" was developed that (1) identifies taxonomic inconsistencies in a given tree topology (optionally including a reference dataset), (2) discriminates between different cases of incongruence in order to identify contamination or misidentified specimens, (3) graphically marks those cases in the tree, which finally can be checked again and, if needed, corrected or removed from the dataset. For this, "TaxCI" may use DNA‐based species delimitations from other approaches (e.g. mPTP) or may perform implemented threshold‐based clustering. The data‐processing pipeline was tested on a newly generated set of barcodes, using the available BOLD records as a reference. A data revision based on the first run of the TaxCI tool resulted in the second TaxCI analysis in a taxonomic match ratio very similar to the one recorded from the reference set (92% vs. 94%). The revised dataset improved by nearly 20% through this procedure compared to the original, uncorrected one. Overall, the new processing pipeline for DNA barcode data allows for the rapid and easy identification of inconsistencies in large datasets, which can be dealt with before submitting them to public data repositories like BOLD or GenBank. Ultimately, this will increase the quality of submitted data and the speed of data submission, while primarily avoiding the deterioration of the accuracy of the data repositories due to ambiguously identified or contaminated specimens. … (more)
- Is Part Of:
- Methods in ecology and evolution. Volume 8:Issue 12(2017)
- Journal:
- Methods in ecology and evolution
- Issue:
- Volume 8:Issue 12(2017)
- Issue Display:
- Volume 8, Issue 12 (2017)
- Year:
- 2017
- Volume:
- 8
- Issue:
- 12
- Issue Sort Value:
- 2017-0008-0012-0000
- Page Start:
- 1878
- Page End:
- 1887
- Publication Date:
- 2017-06-29
- Subjects:
- Coleoptera -- data quality -- DNA barcoding -- German Barcode of Life -- Germany -- reference library -- species identification
Ecology -- Periodicals
Evolution -- Periodicals
577 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)2041-210X ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/2041-210X.12824 ↗
- Languages:
- English
- ISSNs:
- 2041-210X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17486.xml