Ancestry may confound genetic machine learning: Candidate-gene prediction of opioid use disorder as an example. (1st December 2021)
- Record Type:
- Journal Article
- Title:
- Ancestry may confound genetic machine learning: Candidate-gene prediction of opioid use disorder as an example. (1st December 2021)
- Main Title:
- Ancestry may confound genetic machine learning: Candidate-gene prediction of opioid use disorder as an example
- Authors:
- Hatoum, Alexander S.
Wendt, Frank R.
Galimberti, Marco
Polimanti, Renato
Neale, Benjamin
Kranzler, Henry R.
Gelernter, Joel
Edenberg, Howard J.
Agrawal, Arpana - Abstract:
- Abstract: Background: Machine learning (ML) models are beginning to proliferate in psychiatry, however machine learning models in psychiatric genetics have not always accounted for ancestry. Using an empirical example of a proposed genetic test for OUD, and exploring a similar test for tobacco dependence and a simulated binary phenotype, we show that genetic prediction using ML is vulnerable to ancestral confounding. Methods: We utilize five ML algorithms trained with 16 brain reward-derived "candidate" SNPs proposed for commercial use and examine their ability to predict OUD vs. ancestry in an out-of-sample test set (N = 1000, stratified into equal groups of n = 250 cases and controls each of European and African ancestry). We rerun analyses with 8 random sets of allele-frequency matched SNPs. We contrast findings with 11 genome-wide significant variants for tobacco smoking. To document generalizability, we generate and test a random phenotype. Results: None of the 5 ML algorithms predict OUD better than chance when ancestry was balanced but were confounded with ancestry in an out-of-sample test. In addition, the algorithms preferentially predicted admixed subpopulations. Random sets of variants matched to the candidate SNPs by allele frequency produced similar bias. Genome-wide significant tobacco smoking variants were also confounded by ancestry. Finally, random SNPs predicting a random simulated phenotype show that the bias attributable to ancestral confounding couldAbstract: Background: Machine learning (ML) models are beginning to proliferate in psychiatry, however machine learning models in psychiatric genetics have not always accounted for ancestry. Using an empirical example of a proposed genetic test for OUD, and exploring a similar test for tobacco dependence and a simulated binary phenotype, we show that genetic prediction using ML is vulnerable to ancestral confounding. Methods: We utilize five ML algorithms trained with 16 brain reward-derived "candidate" SNPs proposed for commercial use and examine their ability to predict OUD vs. ancestry in an out-of-sample test set (N = 1000, stratified into equal groups of n = 250 cases and controls each of European and African ancestry). We rerun analyses with 8 random sets of allele-frequency matched SNPs. We contrast findings with 11 genome-wide significant variants for tobacco smoking. To document generalizability, we generate and test a random phenotype. Results: None of the 5 ML algorithms predict OUD better than chance when ancestry was balanced but were confounded with ancestry in an out-of-sample test. In addition, the algorithms preferentially predicted admixed subpopulations. Random sets of variants matched to the candidate SNPs by allele frequency produced similar bias. Genome-wide significant tobacco smoking variants were also confounded by ancestry. Finally, random SNPs predicting a random simulated phenotype show that the bias attributable to ancestral confounding could impact any ML-based genetic prediction. Conclusions: Researchers and clinicians are encouraged to be skeptical of claims of high prediction accuracy from ML-derived genetic algorithms for polygenic traits like addiction, particularly when using candidate variants. Highlights: Machine learning (ML) algorithms that utilize genomic data for disease prediction are becoming increasingly common. ML algorithms trained on candidate variants did not predict opioid use disorder. ML algorithms were more likely to identify genomic ancestry regardless of the variants specified or the phenotype under study. Machine learning analyses of genomic data are susceptible to confounds that misclassify admixed individuals. … (more)
- Is Part Of:
- Drug and alcohol dependence. Volume 229:Part B(2021)
- Journal:
- Drug and alcohol dependence
- Issue:
- Volume 229:Part B(2021)
- Issue Display:
- Volume 229, Issue 2 (2021)
- Year:
- 2021
- Volume:
- 229
- Issue:
- 2
- Issue Sort Value:
- 2021-0229-0002-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-12-01
- Subjects:
- Opioid use disorder -- Machine learning -- Algorithmic bias -- Ancestry -- Candidate genes
Drug abuse -- Periodicals
Alcoholism -- Periodicals
616.86 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03768716 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.drugalcdep.2021.109115 ↗
- Languages:
- English
- ISSNs:
- 0376-8716
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3627.890000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20188.xml