A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. (January 2020)
- Record Type:
- Journal Article
- Title:
- A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. (January 2020)
- Main Title:
- A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival
- Authors:
- Simsek, Serhat
Kursuncu, Ugur
Kibis, Eyyub
AnisAbdellatif, Musheera
Dag, Ali - Abstract:
- Highlights: A data Analytics-based methodology is applied to identify important factors. Multiple feature selection methods were employed. The novel findings can be used to complement medical practitioners' decisions. It can be used as a decision support system to improve the decision making process. Abstract: Predicting breast cancer survival is crucial for practitioners to determine possible outcomes and make better treatment plans for the patients. In this study, a hybrid data mining based methodology was constructed to differentiate the variables whose importance for survival change over time. Therefore, the importance of variables was determined for three different time periods (i.e. one, five, and ten years). To conduct such an analysis, the most parsimonious models were constructed by employing one regression analysis method—Least Absolute Shrinkage and Selection Operator (LASSO), and one metaheuristic optimization method, namely a Genetic Algorithm (GA). Due to the high imbalance between the number of survivals and deaths, two well-known resampling procedures—Random Under-sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE)—were applied to increase the performance of the classification models. In the final stage, two data mining models, namely Artificial Neural Networks (ANNs) and Logistic Regression (LR), were utilized along with 10-fold cross-validation. Sensitivity analysis (SA) was conducted for each model to identify the importance of eachHighlights: A data Analytics-based methodology is applied to identify important factors. Multiple feature selection methods were employed. The novel findings can be used to complement medical practitioners' decisions. It can be used as a decision support system to improve the decision making process. Abstract: Predicting breast cancer survival is crucial for practitioners to determine possible outcomes and make better treatment plans for the patients. In this study, a hybrid data mining based methodology was constructed to differentiate the variables whose importance for survival change over time. Therefore, the importance of variables was determined for three different time periods (i.e. one, five, and ten years). To conduct such an analysis, the most parsimonious models were constructed by employing one regression analysis method—Least Absolute Shrinkage and Selection Operator (LASSO), and one metaheuristic optimization method, namely a Genetic Algorithm (GA). Due to the high imbalance between the number of survivals and deaths, two well-known resampling procedures—Random Under-sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE)—were applied to increase the performance of the classification models. In the final stage, two data mining models, namely Artificial Neural Networks (ANNs) and Logistic Regression (LR), were utilized along with 10-fold cross-validation. Sensitivity analysis (SA) was conducted for each model to identify the importance of each variable for a certain model and time period. The obtained results revealed that certain variables lose their importance over time, while others gain importance. This information can assist medical practitioners in identifying specific subsets of variables to focus on in different periods, which will in turn lead to a more effective and efficient cancer care. Moreover, the study findings indicate that extremely parsimonious models can be developed by adopting a purely data-driven approach, rather than eliminating the variables manually. Such methodology can also be applied in treating other types of cancer. … (more)
- Is Part Of:
- Expert systems with applications. Volume 139(2020)
- Journal:
- Expert systems with applications
- Issue:
- Volume 139(2020)
- Issue Display:
- Volume 139, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 139
- Issue:
- 2020
- Issue Sort Value:
- 2020-0139-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-01
- Subjects:
- Data mining -- Healthcare analytics -- Machine learning -- Medical decision making
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2019.112863 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11912.xml