A machine learning method for selection of genetic variants to increase prediction accuracy of type 2 diabetes mellitus using sequencing data. (4th April 2020)

Record Type:: Journal Article
Title:: A machine learning method for selection of genetic variants to increase prediction accuracy of type 2 diabetes mellitus using sequencing data. (4th April 2020)
Main Title:: A machine learning method for selection of genetic variants to increase prediction accuracy of type 2 diabetes mellitus using sequencing data
Authors:: Jung, Luann C.
Wang, Haiyan
Li, Xukun
Wu, Cen
Abstract:: Abstract: Type 2 diabetes mellitus (T2DM) affects millions of people through its life‐altering complications. Worldwide, 3.4 million people die of diabetes annually. Studying the effect of genetic polymorphism on T2DM has been plagued by the available sample size. A 2016 Nature Reviews article summarized that the accuracy of predicting future type 2 diabetes from genetic polymorphism is very low at the population level. Innumerable associations between genes, environmental factors, and type 2 diabetes remain to be discovered. This research presents a method to identify subtle effects of genetic variants using whole genome sequencing data and improve prediction accuracy of T2DM at the population level. To achieve this, a new feature selection procedure and a classifier are proposed. The method involves (a) first applying sparse principal component analysis to genotype data to obtain orthogonal features; (b) building a new classifier using single nucleotide polymorphism (SNP)‐specific regularization parameters to reduce the false positive rate of feature selection; (c) verifying feature relevance through penalized logistic regression. After application to a dataset containing 625 597 SNPs and 23 environmental variables from each of 3326 humans, the method identified 271 genetic variants with subtle effects on T2DM prediction. These variants led to greatly improved prediction accuracy for new patients at the population level. The proposed method also has the advantage of … (more)
Is Part Of:: Statistical analysis and data mining. Volume 13:Number 3(2020)
Journal:: Statistical analysis and data mining
Issue:: Volume 13:Number 3(2020)
Issue Display:: Volume 13, Issue 3 (2020)
Year:: 2020
Volume:: 13
Issue:: 3
Issue Sort Value:: 2020-0013-0003-0000
Page Start:: 261
Page End:: 281
Publication Date:: 2020-04-04
Subjects:: Cornish‐Fisher expansion -- feature selection -- nearest shrunken centroid -- sparse PCA
Data mining -- Statistical methods -- Periodicals
006.312
Journal URLs:: http://www3.interscience.wiley.com/journal/112701062/home ↗
http://onlinelibrary.wiley.com/ ↗
DOI:: 10.1002/sam.11456 ↗
Languages:: English
ISSNs:: 1932-1864
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 8447.424100
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 13171.xml