An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data. (December 2021)
- Record Type:
- Journal Article
- Title:
- An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data. (December 2021)
- Main Title:
- An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data
- Authors:
- Wu, Yao
Zhu, Donghua
Wang, Xuefeng
Zhang, Shuo - Abstract:
- Abstract: To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated un correlated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework's efficacy at identifying miRNA disease associations. Graphical Abstract: General framework of the ensemble learning framework ga1 Highlights: A novel ensemble learning framework is proposed to solve PU learning problems in predictingAbstract: To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated un correlated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework's efficacy at identifying miRNA disease associations. Graphical Abstract: General framework of the ensemble learning framework ga1 Highlights: A novel ensemble learning framework is proposed to solve PU learning problems in predicting miRNA-disease associations. The proposed semi-supervised Kmeans can extract reliable negative samples. A subagging method ensures diverse training sample set that improves prediction accuracy. RVFL networks leads to efficient and robust individual predictions. Empirical results of comparisons and case study confirm the superiority of the proposed framework. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 95(2021)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 95(2021)
- Issue Display:
- Volume 95, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 95
- Issue:
- 2021
- Issue Sort Value:
- 2021-0095-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-12
- Subjects:
- Semi-supervised Kmeans (SS-Kmeans) -- Random vector functional link (RVFL) -- Subagging -- Ensemble learning -- MiRNA-disease association
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2021.107566 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 25255.xml