Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests. (December 2015)

Record Type:: Journal Article
Title:: Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests. (December 2015)
Main Title:: Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests
Authors:: Nguyen, Thanh-Tung
Huang, Joshua
Wu, Qingyao
Nguyen, Thuy
Li, Mark
Abstract:: Abstract Background Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first appliesp -value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. Results This approach enables one to generate more accurate trees with a lower … (more)
Is Part Of:: BMC genomics. Volume 16:Number 2(2015)
Journal:: BMC genomics
Issue:: Volume 16:Number 2(2015)
Issue Display:: Volume 16, Issue 2 (2015)
Year:: 2015
Volume:: 16
Issue:: 2
Issue Sort Value:: 2015-0016-0002-0000
Page Start:: 1
Page End:: 11
Publication Date:: 2015-12
Subjects:: Genome-wide association study -- SNPs Selection -- Random Forests -- Data mining
Genomes -- Periodicals
Gene mapping -- Periodicals
Genomics -- Periodicals
Base Sequence -- Periodicals
Chromosome Mapping -- Periodicals
Genetic Techniques -- Periodicals
Sequence Analysis, DNA -- Periodicals
572.8605
Journal URLs:: http://www.biomedcentral.com/bmcgenomics/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=32 ↗
http://link.springer.com/ ↗
DOI:: 10.1186/1471-2164-16-S2-S5 ↗
Languages:: English
ISSNs:: 1471-2164
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store
Ingest File:: 9828.xml