Assessing the factors influencing the performance of machine learning for classifying haplogroups from Y-STR haplotypes. (November 2022)
- Record Type:
- Journal Article
- Title:
- Assessing the factors influencing the performance of machine learning for classifying haplogroups from Y-STR haplotypes. (November 2022)
- Main Title:
- Assessing the factors influencing the performance of machine learning for classifying haplogroups from Y-STR haplotypes
- Authors:
- Fan, Guang-Yao
- Abstract:
- Abstract: Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion of the Y chromosome. Because of their different rates of mutation, Y-STRs and Y-SNPs play distinct roles in forensic and evolutionary genetics. Current approaches to infer haplogroup status rely on genotyping lots of Y-SNP loci. Given the relationship between haplotype and haplogroup of a Y chromosome, a cost-effective strategy of Y-STRs typing had an advantage in haplogroup prediction. Many machine learning algorithms have sprung up for assigning a Y-STR haplotype to a haplogroup. However, a series of issues must be solved before the using of machine learning method in practice. Thus, the k-nearest neighbor (kNN) classifier was built respectively based on different situations in this study. We assessed different factors which may influence the performance of the kNN prediction model for classifying haplogroups. The training set was based on a diverse ground-truth data set comprising Y-STR haplotypes and corresponding Y-SNP haplogroups. Our results showed that combining different levels of haplogroups into the observations or transracial prediction was impractical. Moreover, using more slow mutation Y-STR loci in the category is good for promoting classification accuracy. The preconditions for an effective and accurate haplogroup assignment by the kNN classifier were revealed. Highlights: The factorsAbstract: Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion of the Y chromosome. Because of their different rates of mutation, Y-STRs and Y-SNPs play distinct roles in forensic and evolutionary genetics. Current approaches to infer haplogroup status rely on genotyping lots of Y-SNP loci. Given the relationship between haplotype and haplogroup of a Y chromosome, a cost-effective strategy of Y-STRs typing had an advantage in haplogroup prediction. Many machine learning algorithms have sprung up for assigning a Y-STR haplotype to a haplogroup. However, a series of issues must be solved before the using of machine learning method in practice. Thus, the k-nearest neighbor (kNN) classifier was built respectively based on different situations in this study. We assessed different factors which may influence the performance of the kNN prediction model for classifying haplogroups. The training set was based on a diverse ground-truth data set comprising Y-STR haplotypes and corresponding Y-SNP haplogroups. Our results showed that combining different levels of haplogroups into the observations or transracial prediction was impractical. Moreover, using more slow mutation Y-STR loci in the category is good for promoting classification accuracy. The preconditions for an effective and accurate haplogroup assignment by the kNN classifier were revealed. Highlights: The factors influencing the performance of kNN algorithms for classifying haplogroups were assessed. Combine all the levels of haplogroups into the observations is inappropriate. Transracial prediction was proved to be impractical. Classification accuracy under the SM group of Y-STR loci was higher than that of the RM group. The kNN classifier can be effectively used for accurate haplogroup assignment. … (more)
- Is Part Of:
- Forensic science international. Volume 340(2022)
- Journal:
- Forensic science international
- Issue:
- Volume 340(2022)
- Issue Display:
- Volume 340, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 340
- Issue:
- 2022
- Issue Sort Value:
- 2022-0340-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-11
- Subjects:
- Y-STR haplotype -- Y-SNP haplogroup -- Machine learning -- KNN -- Prediction performance
Medical jurisprudence -- Periodicals
Chemistry, Forensic -- Periodicals
Forensic Medicine -- Periodicals
Médecine légale -- Périodiques
Chimie légale -- Périodiques
Gerechtelijke geneeskunde
Gerechtelijke chemie
Gerechtelijke psychiatrie
Chemistry, Forensic
Medical jurisprudence
Electronic journals
Periodicals
Electronic journals
614.1 - Journal URLs:
- http://www.clinicalkey.com.au/dura/browse/journalIssue/03790738 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/03790738 ↗
http://www.sciencedirect.com/science/journal/03790738 ↗
http://infotrac.galegroup.com/itw/infomark/1/1/1/purl=rc18_EAIM_0__jn+%22Forensic+Science+International%22?sw_aep=stand ↗
http://www.elsevier.com/homepage/elecserv.htt ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.forsciint.2022.111466 ↗
- Languages:
- English
- ISSNs:
- 0379-0738
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3987.764000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24017.xml