Application of data mining techniques and data analysis methods to measure cancer morbidity and mortality data in a regional cancer registry: The case of the island of Crete, Greece. (July 2017)
- Record Type:
- Journal Article
- Title:
- Application of data mining techniques and data analysis methods to measure cancer morbidity and mortality data in a regional cancer registry: The case of the island of Crete, Greece. (July 2017)
- Main Title:
- Application of data mining techniques and data analysis methods to measure cancer morbidity and mortality data in a regional cancer registry: The case of the island of Crete, Greece
- Authors:
- Varlamis, Iraklis
Apostolakis, Ioannis
Sifaki-Pistolla, Dimitra
Dey, Nilanjan
Georgoulias, Vassilios
Lionis, Christos - Abstract:
- Highlights: The study performs an extensive report on the cancer cases and deaths, in two Cretan counties, from 1998 to 2004. A methodology for pre-processing manually collected registry data is introduced. It comprises data cleaning, duplicate identification and incident aggregation per patient. A methodology for data analysis is followed, based on statistical projections and standardization to the population size and comparison with reported results in country, European and world-wide level. Data mining techniques are applied in order to uncover hidden knowledge and provide insights for further research. Abstract: Background and Objective: Micro or macro-level mapping of cancer statistics is a challenging task that requires long-term planning, prospective studies and continuous monitoring of all cancer cases. The objective of the current study is to present how cancer registry data could be processed using data mining techniques in order to improve the statistical analysis outcomes. Methods: Data were collected from the Cancer Registry of Crete in Greece (counties of Rethymno and Lasithi) for the period 1998–2004. Data collection was performed on paper forms and manually transcribed to a single data file, thus introducing errors and noise (e.g. missing and erroneous values, duplicate entries etc.). Data were pre-processed and prepared for analysis using data mining tools and algorithms. Feature selection was applied to evaluate the contribution of each collected feature inHighlights: The study performs an extensive report on the cancer cases and deaths, in two Cretan counties, from 1998 to 2004. A methodology for pre-processing manually collected registry data is introduced. It comprises data cleaning, duplicate identification and incident aggregation per patient. A methodology for data analysis is followed, based on statistical projections and standardization to the population size and comparison with reported results in country, European and world-wide level. Data mining techniques are applied in order to uncover hidden knowledge and provide insights for further research. Abstract: Background and Objective: Micro or macro-level mapping of cancer statistics is a challenging task that requires long-term planning, prospective studies and continuous monitoring of all cancer cases. The objective of the current study is to present how cancer registry data could be processed using data mining techniques in order to improve the statistical analysis outcomes. Methods: Data were collected from the Cancer Registry of Crete in Greece (counties of Rethymno and Lasithi) for the period 1998–2004. Data collection was performed on paper forms and manually transcribed to a single data file, thus introducing errors and noise (e.g. missing and erroneous values, duplicate entries etc.). Data were pre-processed and prepared for analysis using data mining tools and algorithms. Feature selection was applied to evaluate the contribution of each collected feature in predicting patients' survival. Several classifiers were trained and evaluated for their ability to predict survival of patients. Finally, statistical analysis of cancer morbidity and mortality rates in the two regions was performed in order to validate the initial findings. Results: Several critical points in the process of data collection, preprocessing and analysis of cancer data were derived from the results, while a road-map for future population data studies was developed. In addition, increased morbidity rates were observed in the counties of Crete (Age Standardized Morbidity/Incidence Rates ASIR = 396.45 ± 2.89 and 274.77 ± 2.48 for men and women, respectively) compared to European and world averages (ASIR= 281.6 and 207.3 for men and women in Europe and 203.8 and 165.1 in world level). Significant variation in cancer types between sexes and age groups (the ratio between deaths and reported cases for young patients, less than 34 years old, is at 0.055 when the respective ratio for patients over 75 years old is 0.366) was also observed. Conclusions: This study introduced a methodology for preprocessing and analyzing cancer data, using a combination of data mining techniques that could be a useful tool for other researchers and further enhancement of the cancer registries. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 145(2017)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 145(2017)
- Issue Display:
- Volume 145, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 145
- Issue:
- 2017
- Issue Sort Value:
- 2017-0145-2017-0000
- Page Start:
- 73
- Page End:
- 83
- Publication Date:
- 2017-07
- Subjects:
- Cancer data -- Data mining -- Feature selection -- Crete -- Greece
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2017.04.011 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 579.xml