submitted on 2025-02-26, 10:22 and posted on 2025-02-26, 10:24authored bySaleh Musleh
Diabetic has grown globally tenfold in the past two decades according to the International Diabetes Federation (IDF). The process of diagnosing Diabetes is time consuming especially diagnosed through fasting sugar test or oral glucose tolerance test. Motivated by Qatar deserves the best and partnering with Sidra Medicine / Qatar Biobank a Phenotypic dataset was provided for local Qatar public cohort study. Given this dataset of these Phenotypic features and attributes, feature selection is used to discover important biomarkers attributes that can predict diabetic subjects in the cohort. Three binary machine learning techniques were applied, Support Vector Machine, Quadratic and Logistic Regression used to the best ten features, then best three ones, and finally best two features. The performances of all three machine learning techniques are evaluated on various criteria F1-Measure, Recall, Accuracy, and Precision and then benched marked against 2%, 3% and 5% FPR (False Alarm). Results obtained show that using two attributes with Logistic Regression at 2% False Positive Rate (FPR), has the highest Recall and F1-Score at 0.678 and 0.799 respectively. Followed by Quadratic classifier and SVM classifiers. With 3% FPR, SVM scored high at Recall 0.739 and F1-Score at 0.811. While Logistic Regression scored 0.704 on Recall and slightly higher F1-Score at 0.811, followed by Quadratic classifier at 0.688 on Recall and F1-Score of 0.802. The table 4.8 also shows that with two features all classifiers are performing almost the same with Logistic Regression performing slightly better than SVM and Quadratic classifiers with Accuracy at (0.873) followed by SVM at Accuracy of 0.868 and Quadratic at (0.62). These results are verified using Receiver Operating Characteristic (ROC) curves.