Breast cancer data analysis for survivability studies and prediction. (March 2018)
- Record Type:
- Journal Article
- Title:
- Breast cancer data analysis for survivability studies and prediction. (March 2018)
- Main Title:
- Breast cancer data analysis for survivability studies and prediction
- Authors:
- Shukla, Nagesh
Hagenbuchner, Markus
Win, Khin Than
Yang, Jack - Abstract:
- Highlilghts: Developed a robust unsupervised data analytical model to better understand the survivability of breast cancer patients. Proposed an approach that can analyse survivability in presence of missing data. Provides insights into factors associated with patient survivability. Established cohorts of patients that share similar properties. SEER program dataset is used in this study. Separation of patients into clusters improved the overall survival prediction accuracy. Abstract: Background: Breast cancer is the most common cancer affecting females worldwide. Breast cancer survivability prediction is challenging and a complex research task. Existing approaches engage statistical methods or supervised machine learning to assess/predict the survival prospects of patients. Objective: The main objectives of this paper is to develop a robust data analytical model which can assist in (i) a better understanding of breast cancer survivability in presence of missing data, (ii) providing better insights into factors associated with patient survivability, and (iii) establishing cohorts of patients that share similar properties. Methods: Unsupervised data mining methods viz. the self-organising map (SOM) and density-based spatial clustering of applications with noise (DBSCAN) is used to create patient cohort clusters. These clusters, with associated patterns, were used to train multilayer perceptron (MLP) model for improved patient survivability analysis. A large dataset availableHighlilghts: Developed a robust unsupervised data analytical model to better understand the survivability of breast cancer patients. Proposed an approach that can analyse survivability in presence of missing data. Provides insights into factors associated with patient survivability. Established cohorts of patients that share similar properties. SEER program dataset is used in this study. Separation of patients into clusters improved the overall survival prediction accuracy. Abstract: Background: Breast cancer is the most common cancer affecting females worldwide. Breast cancer survivability prediction is challenging and a complex research task. Existing approaches engage statistical methods or supervised machine learning to assess/predict the survival prospects of patients. Objective: The main objectives of this paper is to develop a robust data analytical model which can assist in (i) a better understanding of breast cancer survivability in presence of missing data, (ii) providing better insights into factors associated with patient survivability, and (iii) establishing cohorts of patients that share similar properties. Methods: Unsupervised data mining methods viz. the self-organising map (SOM) and density-based spatial clustering of applications with noise (DBSCAN) is used to create patient cohort clusters. These clusters, with associated patterns, were used to train multilayer perceptron (MLP) model for improved patient survivability analysis. A large dataset available from SEER program is used in this study to identify patterns associated with the survivability of breast cancer patients. Information gain was computed for the purpose of variable selection. All of these methods are data-driven and require little (if any) input from users or experts. Results: SOM consolidated patients into cohorts of patients with similar properties. From this, DBSCAN identified and extracted nine cohorts (clusters). It is found that patients in each of the nine clusters have different survivability time. The separation of patients into clusters improved the overall survival prediction accuracy based on MLP and revealed intricate conditions that affect the accuracy of a prediction. Conclusions: A new, entirely data driven approach based on unsupervised learning methods improves understanding and helps identify patterns associated with the survivability of patient. The results of the analysis can be used to segment the historical patient data into clusters or subsets, which share common variable values and survivability. The survivability prediction accuracy of a MLP is improved by using identified patient cohorts as opposed to using raw historical data. Analysis of variable values in each cohort provide better insights into survivability of a particular subgroup of breast cancer patients. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 155(2018)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 155(2018)
- Issue Display:
- Volume 155, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 155
- Issue:
- 2018
- Issue Sort Value:
- 2018-0155-2018-0000
- Page Start:
- 199
- Page End:
- 208
- Publication Date:
- 2018-03
- Subjects:
- Breast cancer survivability study -- SEER data -- Machine learning
AJCC American Joint Committee on Cancer -- ALL-AML Acute Lymphocytic leukemia and Acute Myeloid Leukemia -- ANN artificial neural networks -- DBSCAN density-based clustering algorithm -- DLBCL diffuse large B cell Lymphoma -- DT decision trees -- KRBM Kent Ridge Bio-Medical -- LYMLLEUK Lymphoma of all sites and leukemia -- M Metastsis (distant spread) -- MAR multiple association rules -- MLP Multi-layer Perceptron -- N local lymph node involvement -- NB Naïve Bayes -- RBF radial basis function -- RNN recurrent neural network -- SEER Surveillance, Epidemiology, and End Results program -- SOM self-organising Map -- SSL Semi-supervised Learning -- SVM support vector machine -- T1-4 Size of tumor in greatest dimension -- TNM Tumor, Node, Metastasis
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2017.12.011 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 6024.xml