To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences. (June 2022)
- Record Type:
- Journal Article
- Title:
- To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences. (June 2022)
- Main Title:
- To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences
- Authors:
- Hung, Yuan-Mao
Lyu, Wei-Ni
Tsai, Ming-Lin
Liu, Chiang-Lin
Lai, Liang-Chuan
Tsai, Mong-Hsun
Chuang, Eric Y. - Abstract:
- Abstract: Background: Taxonomic assignment is a vital step in the analytic pipeline of bacterial 16S ribosomal RNA (rRNA) sequencing. Over the past decade, most research in this field used next-generation sequencing technology to target V3∼V4 regions to analyze bacterial composition. However, focusing on only one or two hypervariable regions limited the taxonomic resolution to the species level. In recent years, third-generation sequencing technology has allowed researchers to easily access full-length prokaryotic 16S sequences and presented an opportunity to attain greater taxonomic depth. However, the accuracy of current taxonomic classifiers in analyzing 16S full-length sequence analysis remains unclear. Objective: The purpose of this study is to compare the accuracy of several widely-used 16S sequence classifiers and to indicate the most suitable 16S training dataset for each classifier. Methods: Both curated 16S full-length sequences and cross-validation datasets were used to validate the performance of seven classifiers, including QIIME2, mothur, SINTAX, SPINGO, Ribosomal Database Project (RDP), IDTAXA, and Kraken2. Different sequence training datasets, such as SILVA, Greengenes, and RDP, were used to train the classification models. Results: The accuracy of each classifier to the species levels were illustrated. According to the experimental results, using RDP sequences as the training data, SINTAX and SPINGO provided the highest accuracy, and were recommended for theAbstract: Background: Taxonomic assignment is a vital step in the analytic pipeline of bacterial 16S ribosomal RNA (rRNA) sequencing. Over the past decade, most research in this field used next-generation sequencing technology to target V3∼V4 regions to analyze bacterial composition. However, focusing on only one or two hypervariable regions limited the taxonomic resolution to the species level. In recent years, third-generation sequencing technology has allowed researchers to easily access full-length prokaryotic 16S sequences and presented an opportunity to attain greater taxonomic depth. However, the accuracy of current taxonomic classifiers in analyzing 16S full-length sequence analysis remains unclear. Objective: The purpose of this study is to compare the accuracy of several widely-used 16S sequence classifiers and to indicate the most suitable 16S training dataset for each classifier. Methods: Both curated 16S full-length sequences and cross-validation datasets were used to validate the performance of seven classifiers, including QIIME2, mothur, SINTAX, SPINGO, Ribosomal Database Project (RDP), IDTAXA, and Kraken2. Different sequence training datasets, such as SILVA, Greengenes, and RDP, were used to train the classification models. Results: The accuracy of each classifier to the species levels were illustrated. According to the experimental results, using RDP sequences as the training data, SINTAX and SPINGO provided the highest accuracy, and were recommended for the task of classifying prokaryotic 16S full-length rRNA sequences. Conclusion: The performance of the classifiers was affected by sequence training datasets. Therefore, different classifiers should use the most suitable 16S training data to improve the accuracy and taxonomy resolution in the taxonomic assignment. Highlights: Compare the performance of several prokaryotic 16S rRNA sequence classifiers for 16S full-length sequence classification. Indicate the most suitable 16S reference database for each prokaryotic 16S sequence classifier. Focus on the classifiers' performance at the genus and species levels. Helps 16S metagenomics researchers optimize their analytical pipelines for third-generation sequencing data analysis. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 145(2022)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 145(2022)
- Issue Display:
- Volume 145, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 145
- Issue:
- 2022
- Issue Sort Value:
- 2022-0145-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06
- Subjects:
- Metagenomics -- Sequence classifier -- 16S full-Length -- Taxonomic assignment -- Third generation sequencing
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2022.105416 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21569.xml