Taxonomy dimension reduction for colorectal cancer prediction. (December 2019)
- Record Type:
- Journal Article
- Title:
- Taxonomy dimension reduction for colorectal cancer prediction. (December 2019)
- Main Title:
- Taxonomy dimension reduction for colorectal cancer prediction
- Authors:
- Qu, Kaiyang
Gao, Feng
Guo, Fei
Zou, Quan - Abstract:
- Graphical abstract: Highlights: Using the species and number of microorganisms for prediction CRC, experimental data can be easily obtained. Using taxonomy files to predict CRC, and the influential microorganisms were found. A variety of feature extraction methods are used to improve the prediction accuracy. Use machine learning method to improve prediction efficiency. Ensemble feature selection methods were proposed, which can use fewer features and get better results. Abstract: A growing number of people suffer from colorectal cancer, which is one of the most common cancers. It is essential to diagnose and treat the cancer as early as possible. The disease may change the microorganism communities in the gut, and it could be an efficient method to employ gut microorganisms to predict colorectal cancer. In this study, we selected operational taxonomic units that include several kinds of microorganisms to predict colorectal cancer. To find the most important microorganisms and obtain the best prediction performance, we explore effective feature selection methods. We employ three main steps. First, we use a single method to reduce features. Next, to reduce the number of features, we integrate the dimension reduction methods correlation-based feature selection and maximum relevance–maximum distance (MRMD 1.0 and MRMD 2.0). Then, we selected the important features according to the taxonomy files. In this study, we created training and test sets to obtain a more objectiveGraphical abstract: Highlights: Using the species and number of microorganisms for prediction CRC, experimental data can be easily obtained. Using taxonomy files to predict CRC, and the influential microorganisms were found. A variety of feature extraction methods are used to improve the prediction accuracy. Use machine learning method to improve prediction efficiency. Ensemble feature selection methods were proposed, which can use fewer features and get better results. Abstract: A growing number of people suffer from colorectal cancer, which is one of the most common cancers. It is essential to diagnose and treat the cancer as early as possible. The disease may change the microorganism communities in the gut, and it could be an efficient method to employ gut microorganisms to predict colorectal cancer. In this study, we selected operational taxonomic units that include several kinds of microorganisms to predict colorectal cancer. To find the most important microorganisms and obtain the best prediction performance, we explore effective feature selection methods. We employ three main steps. First, we use a single method to reduce features. Next, to reduce the number of features, we integrate the dimension reduction methods correlation-based feature selection and maximum relevance–maximum distance (MRMD 1.0 and MRMD 2.0). Then, we selected the important features according to the taxonomy files. In this study, we created training and test sets to obtain a more objective evaluation. Random forest, naïve Bayes, and decision tree classifiers were evaluated. The results show that the methods proposed in this study are better than hierarchical feature engineering. The proposed method, which combines correlation-based feature selection with MRMD 2.0, performed the best on the CRC2 dataset. The dataset and methods can be found in http://lab.malab.cn/data/microdata/data.html . … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 83(2019)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 83(2019)
- Issue Display:
- Volume 83, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 83
- Issue:
- 2019
- Issue Sort Value:
- 2019-0083-2019-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-12
- Subjects:
- Machine learning -- Microbial -- Maximum relevant Maximum distance -- Correlation-based feature selection -- Colorectal cancer
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2019.107160 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 23172.xml