Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. (December 2017)
- Record Type:
- Journal Article
- Title:
- Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. (December 2017)
- Main Title:
- Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm
- Authors:
- Maniruzzaman, Md.
Kumar, Nishith
Menhazul Abedin, Md.
Shaykhul Islam, Md.
Suri, Harman S.
El-Baz, Ayman S.
Suri, Jasjit S. - Abstract:
- Highlights: Introduce Gaussian process classification (GPC) technique for diabetes classification in machine learning framework. GPC uses three kernels types, namely: linear, polynomial and radial basis kernel. Comparison of GPC with existing classification techniques such as: LDA, QDA and NB. GPC-based model gave highest accuracy, sensitivity, specificity and other performance parameters. Machine learning systems are very useful for Diabetes data classification, one of the deadly diseases of the globe. Abstract: Background and objective: Diabetes is a silent killer. The main cause of this disease is the presence of excessive amounts of metabolites such as glucose. There were about 387 million diabetic people all over the world in 2014. The financial burden of this disease has been calculated to be about $13, 700 per year. According to the World Health Organization (WHO), these figures will more than double by the year 2030. This cost will be reduced dramatically if someone can predict diabetes statistically on the basis of some covariates. Although several classification techniques are available, it is very difficult to classify diabetes. The main objectives of this paper are as follows: (i) Gaussian process classification (GPC), (ii) comparative classifier for diabetes data classification, (iii) data analysis using the cross-validation approach, (iv) interpretation of the data analysis and (v) benchmarking our method against others. Methods: To classify diabetes, severalHighlights: Introduce Gaussian process classification (GPC) technique for diabetes classification in machine learning framework. GPC uses three kernels types, namely: linear, polynomial and radial basis kernel. Comparison of GPC with existing classification techniques such as: LDA, QDA and NB. GPC-based model gave highest accuracy, sensitivity, specificity and other performance parameters. Machine learning systems are very useful for Diabetes data classification, one of the deadly diseases of the globe. Abstract: Background and objective: Diabetes is a silent killer. The main cause of this disease is the presence of excessive amounts of metabolites such as glucose. There were about 387 million diabetic people all over the world in 2014. The financial burden of this disease has been calculated to be about $13, 700 per year. According to the World Health Organization (WHO), these figures will more than double by the year 2030. This cost will be reduced dramatically if someone can predict diabetes statistically on the basis of some covariates. Although several classification techniques are available, it is very difficult to classify diabetes. The main objectives of this paper are as follows: (i) Gaussian process classification (GPC), (ii) comparative classifier for diabetes data classification, (iii) data analysis using the cross-validation approach, (iv) interpretation of the data analysis and (v) benchmarking our method against others. Methods: To classify diabetes, several classification techniques are used such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and Naive Bayes (NB). However, most of the medical data show non-normality, non-linearity and inherent correlation structure. So in this paper we adapted Gaussian process (GP)-based classification technique using three kernels namely: linear, polynomial and radial basis kernel. We also investigate the performance of a GP-based classification technique in comparison to existing techniques such as LDA, QDA and NB. Performances are evaluated by using the accuracy (ACC), sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV) and receiver-operating characteristic (ROC) curves. Results: Pima Indian diabetes dataset is taken as part of the study. This consists of 768 patients, of which 268 patients are diabetic and 500 patients are controls. Our machine learning system shows the performance of GP-based model as: ACC 81.97%, SE 91.79%, SP 63.33%, PPV 84.91% and NPV 62.50% which are larger compared to other methods. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 152(2017)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 152(2017)
- Issue Display:
- Volume 152, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 152
- Issue:
- 2017
- Issue Sort Value:
- 2017-0152-2017-0000
- Page Start:
- 23
- Page End:
- 34
- Publication Date:
- 2017-12
- Subjects:
- Diabetes -- Machine learning -- GPC -- LDA -- QDA -- NB
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2017.09.004 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 4896.xml