Hybrid Type-2 Diabetes Prediction Model Using SMOTE, K-means Clustering, PCA, and Logistic Regression

Authors

  • Atul Kumar Ramotra Department of Computer Science and IT, University of Jammu, Jammu, Jammu and Kashmir, India
  • Vibhakar Mansotra Department of Computer Science and IT, University of Jammu, Jammu, Jammu and Kashmir, India

DOI:

https://doi.org/10.21276/apjhs.2021.8.3.23

Keywords:

Classification, Clustering, Data mining, Diabetes prediction, Principal component analysis, Synthetic Minority Over-sampling Technique

Abstract

Early prediction of diabetes is very important as diabetes can turn out to be life threatening for the patients in the later stages. In this paper, a hybrid framework for the prediction of type-2 diabetes is developed. In the first step, imbalance dataset is balanced using Synthetic Minority Over-sampling Technique. Then, clustering is applied using k-means clustering technique and all the incorrectly clustered entries and outliers are removed. Principal component analysis is then used for dimensionality reduction of the dataset. In the final step, classification is done using logistic regression (LR), naïve Bayes, support vector machine, and k-nearest neighbors classification techniques. Experimental analysis shows that 98.96% of accuracy is achieved by the proposed hybrid model using LR. The results are validated using 10-fold cross-validation.

Downloads

Download data is not yet available.

Downloads

Published

2021-07-16

How to Cite

Atul Kumar Ramotra, & Vibhakar Mansotra. (2021). Hybrid Type-2 Diabetes Prediction Model Using SMOTE, K-means Clustering, PCA, and Logistic Regression. Asian Pacific Journal of Health Sciences, 8(3), 137–140. https://doi.org/10.21276/apjhs.2021.8.3.23