|
Twitter
|
Facebook
|
Google+
|
VKontakte
|
LinkedIn
|
Viadeo
|
English
|
Français
|
Español
|
العربية
|
 
International Journal of Innovation and Applied Studies
ISSN: 2028-9324     CODEN: IJIABO     OCLC Number: 828807274     ZDB-ID: 2703985-7
 
 
Thursday 02 October 2025

About IJIAS

News

Submission

Downloads

Archives

Custom Search

Contact

  • Contact us
  • Newsletter:

Connect with IJIAS

  Now IJIAS is indexed in EBSCO, ResearchGate, ProQuest, Chemical Abstracts Service, Index Copernicus, IET Inspec Direct, Ulrichs Web, Google Scholar, CAS Abstracts, J-Gate, UDL Library, CiteSeerX, WorldCat, Scirus, Research Bible and getCited, etc.  
 
 
 

In Press: Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation



                 

Adlès Francis Kouassi1, Tanon Lambert Kadjo2, K. Yablé Didier3, and Olivier Asseu4

1 ESATIC, Côte d’Ivoire
2 INPHB, Côte d’Ivoire
3 ESATIC, Côte d’Ivoire
4 ESATIC, Côte d’Ivoire

Original language: English

Copyright © 2025 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract


Early detection of type 2 diabetes is a public health priority due to its high prevalence and the severe complications that may result. However, traditional machine learning approaches face several limitations, particularly in model optimization, handling class imbalance, and ensuring clinical interpretability. In this context, we propose an optimized machine learning approach that combines advanced preprocessing, optimization, and modeling techniques. Our methodology is based on four key components: (i) feature engineering guided by medical knowledge (e.g., Glucose/BMI, Age×BMI), (ii) adaptive class rebalancing using SMOTEENN, (iii) Bayesian hyperparameter optimization with Optuna for XGBoost and MLP (Multilayer Perceptron) models, and (iv) an ensemble stacking strategy integrating Random Forest, XGBoost, and MLP, with logistic regression as the meta-learner. The PIMA Indians and Frankfurt Hospital datasets were used to validate this approach. The results are remarkable: an accuracy of 94.05% on PIMA, 99.27% on Frankfurt, and 99.71% on the merged data, with an AUC reaching 99.99%. SHAP analysis highlights the increased importance of insulin in PIMA and the Age×BMI interaction in Frankfurt, while confirming the stability of universal markers such as glucose and BMI. This approach not only delivers outstanding predictive performance but also provides differentiated interpretability, paving the way for more personalized and equitable predictive medicine.

Author Keywords: Machine Learning, Diabetes, Stacking Ensemble, Bayesian Optimization, Feature Engineering, SHAP, Medical Prediction.