Hybrid Type-2 Diabetes Prediction Model Using SMOTE, K-means Clustering, PCA, and Logistic Regression
DOI:
https://doi.org/10.21276/apjhs.2021.8.3.23Keywords:
Classification, Clustering, Data mining, Diabetes prediction, Principal component analysis, Synthetic Minority Over-sampling TechniqueAbstract
Early prediction of diabetes is very important as diabetes can turn out to be life threatening for the patients in the later stages. In this paper, a hybrid framework for the prediction of type-2 diabetes is developed. In the first step, imbalance dataset is balanced using Synthetic Minority Over-sampling Technique. Then, clustering is applied using k-means clustering technique and all the incorrectly clustered entries and outliers are removed. Principal component analysis is then used for dimensionality reduction of the dataset. In the final step, classification is done using logistic regression (LR), naïve Bayes, support vector machine, and k-nearest neighbors classification techniques. Experimental analysis shows that 98.96% of accuracy is achieved by the proposed hybrid model using LR. The results are validated using 10-fold cross-validation.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Atul Kumar Ramotra, Vibhakar Mansotra
This work is licensed under a Creative Commons Attribution 4.0 International License.
Asian Pacific Journal of Health Sciences applies the Creative Commons Attribution (CC-BY) license to published articles. Under this license, authors retain ownership of the copyright for their content, but they allow anyone to download, reuse, reprint, modify, distribute and/or copy the content as long as the original authors and source are cited. Appropriate attribution can be provided by simply citing the original article.