Predicting haplogroups using a versatile machine learning program (PredYMaLe) on a new mutationally balanced 32 Y-STR multiplex (CombYplex): Unlocking the full potential of the human STR mutation rate spectrum to estimate forensic parameters. (September 2020)
- Record Type:
- Journal Article
- Title:
- Predicting haplogroups using a versatile machine learning program (PredYMaLe) on a new mutationally balanced 32 Y-STR multiplex (CombYplex): Unlocking the full potential of the human STR mutation rate spectrum to estimate forensic parameters. (September 2020)
- Main Title:
- Predicting haplogroups using a versatile machine learning program (PredYMaLe) on a new mutationally balanced 32 Y-STR multiplex (CombYplex): Unlocking the full potential of the human STR mutation rate spectrum to estimate forensic parameters
- Authors:
- Bouakaze, Caroline
Delehelle, Franklin
Saenz-Oyhéréguy, Nancy
Moreira, Andreia
Schiavinato, Stéphanie
Croze, Myriam
Delon, Solène
Fortes-Lima, Cesar
Gibert, Morgane
Bujan, Louis
Huyghe, Eric
Bellis, Gil
Calderon, Rosario
Hernández, Candela Lucia
Avendaño-Tamayo, Efren
Bedoya, Gabriel
Salas, Antonio
Mazières, Stéphane
Charioni, Jacques
Migot-Nabias, Florence
Ruiz-Linares, Andres
Dugoujon, Jean-Michel
Thèves, Catherine
Mollereau-Manaute, Catherine
Noûs, Camille
Poulet, Nicolas
King, Turi
D'Amato, Maria Eugenia
Balaresque, Patricia - Abstract:
- Highlights: 32 Y-STR well-balanced mutation rate (CombYplex) and machine-learning program (PredYMaLe). Y-STR-based haplogroup prediction. Best predictions using SVM and Random Forest classifiers. Assignation accuracy scores (or prediction scores) using SVM: 97 %. Heterogeneous haplogroup predictions among classes. Confounding factors: small sample sizes, gene conversion. Abstract: We developed a new mutationally well-balanced 32 Y-STR multiplex (CombYplex ) together with a machine learning (ML) program PredYM aL e to assess the impact of STR mutability on haplogourp prediction, while respecting forensic community criteria (high DC/HD). We designed CombYplex around two sub-panels M1 and M2 characterized by average and high-mutation STR panels. Using these two sub-panels, we tested how our program PredYmale reacts to mutability when considering basal branches and, moving down, terminal branches. We tested first the discrimination capacity of CombYplex on 996 human samples using various forensic and statistical parameters and showed that its resolution is sufficient to separate haplogroup classes. In parallel, PredYM aL e was designed and used to test whether a ML approach can predict haplogroup classes from Y-STR profiles. Applied to our kit, SVM and Random Forest classifiers perform very well (average 97 %), better than Neural Network (average 91 %) and Bayesian methods (< 90 %). We observe heterogeneity in haplogroup assignation accuracy among classes, with most haplogroupsHighlights: 32 Y-STR well-balanced mutation rate (CombYplex) and machine-learning program (PredYMaLe). Y-STR-based haplogroup prediction. Best predictions using SVM and Random Forest classifiers. Assignation accuracy scores (or prediction scores) using SVM: 97 %. Heterogeneous haplogroup predictions among classes. Confounding factors: small sample sizes, gene conversion. Abstract: We developed a new mutationally well-balanced 32 Y-STR multiplex (CombYplex ) together with a machine learning (ML) program PredYM aL e to assess the impact of STR mutability on haplogourp prediction, while respecting forensic community criteria (high DC/HD). We designed CombYplex around two sub-panels M1 and M2 characterized by average and high-mutation STR panels. Using these two sub-panels, we tested how our program PredYmale reacts to mutability when considering basal branches and, moving down, terminal branches. We tested first the discrimination capacity of CombYplex on 996 human samples using various forensic and statistical parameters and showed that its resolution is sufficient to separate haplogroup classes. In parallel, PredYM aL e was designed and used to test whether a ML approach can predict haplogroup classes from Y-STR profiles. Applied to our kit, SVM and Random Forest classifiers perform very well (average 97 %), better than Neural Network (average 91 %) and Bayesian methods (< 90 %). We observe heterogeneity in haplogroup assignation accuracy among classes, with most haplogroups having high prediction scores (99–100 %) and two (E1b1b and G) having lower scores (67 %). The small sample sizes of these classes explain the high tendency to misclassify the Y-profiles of these haplogroups; results were measurably improved as soon as more training data were added. We provide evidence that our ML approach is a robust method to accurately predict haplogroups when it is combined with a sufficient number of markers, well-balanced mutation rate Y-STR panels, and large ML training sets. Further research on confounding factors (such as CNV-STR or gene conversion) and ideal STR panels in regard to the branches analysed can be developed to help classifiers further optimize prediction scores. … (more)
- Is Part Of:
- Forensic science international. Volume 48(2020)
- Journal:
- Forensic science international
- Issue:
- Volume 48(2020)
- Issue Display:
- Volume 48, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 48
- Issue:
- 2020
- Issue Sort Value:
- 2020-0048-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-09
- Subjects:
- Y-STR -- Machine learning -- Assignation accuracy and haplogroup prediction (Hg prediction) -- Incremental mutation rates
Forensic genetics -- Periodicals
Génétique légale -- Périodiques
Forensic genetics
Electronic journals
Periodicals
614.1 - Journal URLs:
- http://www.clinicalkey.com.au/dura/browse/journalIssue/18724973 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/18724973 ↗
http://www.sciencedirect.com/science/journal/18724973 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.fsigen.2020.102342 ↗
- Languages:
- English
- ISSNs:
- 1872-4973
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3987.764050
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14009.xml