C-PUGP: A cluster-based positive unlabeled learning method for disease gene prediction and prioritization. (October 2018)
- Record Type:
- Journal Article
- Title:
- C-PUGP: A cluster-based positive unlabeled learning method for disease gene prediction and prioritization. (October 2018)
- Main Title:
- C-PUGP: A cluster-based positive unlabeled learning method for disease gene prediction and prioritization
- Authors:
- Vasighizaker, Akram
Jalili, Saeed - Abstract:
- Graphical abstract: Highlights: A novel PU learning method is proposed to identify and prioritize disease genes. The method is based on clustering and one class classification algorithm. Negative instances regarded especially to estimate their label more reliable. Abstract: Disease gene detection is an important stage in the understanding disease processes and treatment. Some candidate disease genes are identified using many machine learning methods Although there are some differences in these methods including feature vector of genes, the method used to selecting reliable negative data (non-disease genes), and the classification method, the lack of negative data is the most significant challenge of them. Recently, candidate disease genes are identified by semi-supervised learning methods based on positive and unlabeled data. These methods are reasonably accurate and achieved more desirable results versus preceding methods. In this article, we propose a novel Positive Unlabeled (PU) learning technique based upon clustering and One-Class classification algorithm. In this regard, unlike existing methods, we make a more Reliable Negative (RN) set in three steps: (1) Clustering positive data, (2) Learning One-Class classifier models using the clusters, and (3) Selecting intersection set of negative data as the Reliable Negative set. Next, we attempt to identify and rank the candidate disease genes using a binary classifier based on support vector machine (SVM) algorithm.Graphical abstract: Highlights: A novel PU learning method is proposed to identify and prioritize disease genes. The method is based on clustering and one class classification algorithm. Negative instances regarded especially to estimate their label more reliable. Abstract: Disease gene detection is an important stage in the understanding disease processes and treatment. Some candidate disease genes are identified using many machine learning methods Although there are some differences in these methods including feature vector of genes, the method used to selecting reliable negative data (non-disease genes), and the classification method, the lack of negative data is the most significant challenge of them. Recently, candidate disease genes are identified by semi-supervised learning methods based on positive and unlabeled data. These methods are reasonably accurate and achieved more desirable results versus preceding methods. In this article, we propose a novel Positive Unlabeled (PU) learning technique based upon clustering and One-Class classification algorithm. In this regard, unlike existing methods, we make a more Reliable Negative (RN) set in three steps: (1) Clustering positive data, (2) Learning One-Class classifier models using the clusters, and (3) Selecting intersection set of negative data as the Reliable Negative set. Next, we attempt to identify and rank the candidate disease genes using a binary classifier based on support vector machine (SVM) algorithm. Experimental results indicate that the proposed method yields to the best results, that is 92.8, 93.6, and 93.1 in terms of precision, recall, and F-measure respectively. Compared to the existing methods, the increase of performances of our proposed method is 11.7 percent better than the best method in terms of F-measure. Also, results show about 6% increase in the prioritization results. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 76(2018)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 76(2018)
- Issue Display:
- Volume 76, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 76
- Issue:
- 2018
- Issue Sort Value:
- 2018-0076-2018-0000
- Page Start:
- 23
- Page End:
- 31
- Publication Date:
- 2018-10
- Subjects:
- Candidate disease genes -- Identification -- Classification -- Clustering -- Semi-supervised learning -- Pul
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2018.05.022 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 23145.xml