An empirical study of self-training and data balancing techniques for splice site prediction. (2017)
- Record Type:
- Journal Article
- Title:
- An empirical study of self-training and data balancing techniques for splice site prediction. (2017)
- Main Title:
- An empirical study of self-training and data balancing techniques for splice site prediction
- Authors:
- Stanescu, Ana
Caragea, Doina - Abstract:
- Thanks to Next Generation Sequencing technologies, unlabelled data is now generated easily, while the annotation process remains expensive. Semi-supervised learning represents a cost-effective alternative to supervised learning, as it can improve supervised classifiers by making use of unlabelled data. However, semi-supervised learning has not been studied much for problems with highly skewed class distributions, which are prevalent in bioinformatics. To address this limitation, we carry out a study of a semi-supervised learning algorithm, specifically self-training based on Naïve Bayes, with focus on data-level approaches for handling imbalanced class distributions. Our study is conducted on the problem of predicting splice sites and it is based on datasets for which the ratio of positive to negative examples is 1-to-99. Our results show that under certain conditions semi-supervised learning algorithms are a better choice than purely supervised classification algorithms.
- Is Part Of:
- International journal of bioinformatics research and applications. Volume 13:Number 1(2017)
- Journal:
- International journal of bioinformatics research and applications
- Issue:
- Volume 13:Number 1(2017)
- Issue Display:
- Volume 13, Issue 1 (2017)
- Year:
- 2017
- Volume:
- 13
- Issue:
- 1
- Issue Sort Value:
- 2017-0013-0001-0000
- Page Start:
- 40
- Page End:
- 61
- Publication Date:
- 2017
- Subjects:
- semi-supervised learning -- supervised learning -- imbalanced data -- data balancing -- under-sampling -- over-sampling -- splice sites -- self-training -- splice site prediction -- next generation sequencing -- highly skewed class distributions -- bioinformatics -- naive Bayes
Bioinformatics -- Periodicals
570.285 - Journal URLs:
- http://www.inderscience.com/browse/index.php?journalID=155 ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1744-5485
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 8139.xml