A computational method for prediction of rSNPs in human genome. (June 2016)
- Record Type:
- Journal Article
- Title:
- A computational method for prediction of rSNPs in human genome. (June 2016)
- Main Title:
- A computational method for prediction of rSNPs in human genome
- Authors:
- Li, Rong
Han, Jiuqiang
Liu, Jun
Zheng, Jiguang
Liu, Ruiling - Abstract:
- Graphical abstract: Highlights: A computational method for detection of rSNPs is proposed. A new ensemble method for handling unbalanced data is applied. Differences in hydroxyl radical cleavage patterns caused by SNPs are analyzed. Abstract: Regulatory single nucleotide polymorphisms (rSNPs) in human genomes are thought to be responsible for phenotypic differences, including susceptibility to diseases and treatment outcomes, even they do not change any gene product. However, a genome-wide search for rSNPs has not been properly addressed so far. In this work, a computational method for rSNP identification is proposed. As background SNPs far outnumber rSNPs, an ensemble method is applied to handle imbalanced data, which firstly converts an unbalanced dataset into several balanced ones and then models for every balanced dataset. Two major types of features are extracted, that are sequence based features and allele-specific based features. Then random forest is applied to build the recognition model for each balanced dataset. Finally, ensemble strategies are adopted to combine the result of each model together. We have tested our method on a set of experimentally verified rSNPs, and leave-one-out cross-validation results showed that our method can achieve accuracy with sensitivity of 73.8%, specificity of 71.8% and the area under ROC curve (AUC) is 0.756. In addition, our method is threshold free and doesn't rely on data of regulatory elements, thus it will have betterGraphical abstract: Highlights: A computational method for detection of rSNPs is proposed. A new ensemble method for handling unbalanced data is applied. Differences in hydroxyl radical cleavage patterns caused by SNPs are analyzed. Abstract: Regulatory single nucleotide polymorphisms (rSNPs) in human genomes are thought to be responsible for phenotypic differences, including susceptibility to diseases and treatment outcomes, even they do not change any gene product. However, a genome-wide search for rSNPs has not been properly addressed so far. In this work, a computational method for rSNP identification is proposed. As background SNPs far outnumber rSNPs, an ensemble method is applied to handle imbalanced data, which firstly converts an unbalanced dataset into several balanced ones and then models for every balanced dataset. Two major types of features are extracted, that are sequence based features and allele-specific based features. Then random forest is applied to build the recognition model for each balanced dataset. Finally, ensemble strategies are adopted to combine the result of each model together. We have tested our method on a set of experimentally verified rSNPs, and leave-one-out cross-validation results showed that our method can achieve accuracy with sensitivity of 73.8%, specificity of 71.8% and the area under ROC curve (AUC) is 0.756. In addition, our method is threshold free and doesn't rely on data of regulatory elements, thus it will have better adaptability when facing different data scenarios. The original data and the source matlab codes involved are available athttps://sourceforge.net/projects/rsnpdect/ . … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 62(2016)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 62(2016)
- Issue Display:
- Volume 62, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 62
- Issue:
- 2016
- Issue Sort Value:
- 2016-0062-2016-0000
- Page Start:
- 96
- Page End:
- 103
- Publication Date:
- 2016-06
- Subjects:
- Regulatory SNPs -- Imbalanced data -- Random forest -- Hydroxyl radical cleavage patterns
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2016.04.001 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 7783.xml