Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation

Kouassi, Adlès Francis; Kadjo, Tanon Lambert; Didier, K. Yablé; Asseu, Olivier

Volume 46, Issue 3, September 2025, Pages 583–595

Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation

BibTex | RIS | EndNote | RefWorks

@article{IJIAS-25-238-12,
author = {Adlès Francis Kouassi and Tanon Lambert Kadjo and K. Yablé Didier and Olivier Asseu},
title = {{Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation}},
journal = {International Journal of Innovation and Applied Studies},
volume = {46},
year = {2025},
pages = {583--595},
issue = {3},
number = {3},
issn = {2028-9324},
url = {http://www.ijias.issr-journals.org/abstract.php?article=IJIAS-25-238-12},
abstract_html_url = {http://www.ijias.issr-journals.org/abstract.php?article=IJIAS-25-238-12},
pdf_url = {http://www.issr-journals.org/links/papers.php?journal=ijias&application=pdf&article=IJIAS-25-238-12},
document_type={Article},
source={www.issr-journals.org}
}

TY  - JOUR
ID  - 
TI  - Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation
AU  - Adlès Francis Kouassi
AU  - Tanon Lambert Kadjo
AU  - K. Yablé Didier
AU  - Olivier Asseu
PY  - 2025
VL  - 46
IS  - 3
SP  - 583
EP  - 595
JO  - International Journal of Innovation and Applied Studies
T2  - International Journal of Innovation and Applied Studies
SN  - 20289324
UR  - http://www.ijias.issr-journals.org/abstract.php?article=IJIAS-25-238-12
AB  - Early detection of type 2 diabetes is a public health priority due to its high prevalence and the severe complications that may result. However, traditional machine learning approaches face several limitations, particularly in model optimization, handling class imbalance, and ensuring clinical interpretability.
In this context, we propose an optimized machine learning approach that combines advanced preprocessing, optimization, and modeling techniques. Our methodology is based on four key components: (i) feature engineering guided by medical knowledge (e.g., Glucose/BMI, Age×BMI), (ii) adaptive class rebalancing using SMOTEENN, (iii) Bayesian hyperparameter optimization with Optuna for XGBoost and MLP (Multilayer Perceptron) models, and (iv) an ensemble stacking strategy integrating Random Forest, XGBoost, and MLP, with logistic regression as the meta-learner.
The PIMA Indians and Frankfurt Hospital datasets were used to validate this approach. The results are remarkable: an accuracy of 94.05% on PIMA, 99.27% on Frankfurt, and 99.71% on the merged data, with an AUC reaching 99.99%.
SHAP analysis highlights the increased importance of insulin in PIMA and the Age×BMI interaction in Frankfurt, while confirming the stability of universal markers such as glucose and BMI.
This approach not only delivers outstanding predictive performance but also provides differentiated interpretability, paving the way for more personalized and equitable predictive medicine.
ER  -

TY  - JOUR
ID  - 
TI  - Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation
AU  - Adlès Francis Kouassi
AU  - Tanon Lambert Kadjo
AU  - K. Yablé Didier
AU  - Olivier Asseu
PY  - 2025
VL  - 46
IS  - 3
SP  - 583
EP  - 595
JO  - International Journal of Innovation and Applied Studies
SN  - 20289324
AB  - 
Early detection of type 2 diabetes is a public health priority due to its high prevalence and the severe complications that may result. However, traditional machine learning approaches face several limitations, particularly in model optimization, handling class imbalance, and ensuring clinical interpretability.
In this context, we propose an optimized machine learning approach that combines advanced preprocessing, optimization, and modeling techniques. Our methodology is based on four key components: (i) feature engineering guided by medical knowledge (e.g., Glucose/BMI, Age×BMI), (ii) adaptive class rebalancing using SMOTEENN, (iii) Bayesian hyperparameter optimization with Optuna for XGBoost and MLP (Multilayer Perceptron) models, and (iv) an ensemble stacking strategy integrating Random Forest, XGBoost, and MLP, with logistic regression as the meta-learner.
The PIMA Indians and Frankfurt Hospital datasets were used to validate this approach. The results are remarkable: an accuracy of 94.05% on PIMA, 99.27% on Frankfurt, and 99.71% on the merged data, with an AUC reaching 99.99%.
SHAP analysis highlights the increased importance of insulin in PIMA and the Age×BMI interaction in Frankfurt, while confirming the stability of universal markers such as glucose and BMI.
This approach not only delivers outstanding predictive performance but also provides differentiated interpretability, paving the way for more personalized and equitable predictive medicine.
ER  -

RT Journal Article
ID IJIAS-25-238-12
A1 Adlès Francis Kouassi
A1 Tanon Lambert Kadjo
A1 K. Yablé Didier
A1 Olivier Asseu
YR 2025
T1 Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation
JF International Journal of Innovation and Applied Studies

Download

Adlès Francis Kouassi¹, Tanon Lambert Kadjo², K. Yablé Didier³, and Olivier Asseu⁴

¹ ESATIC, Côte d’Ivoire
² INPHB, Côte d’Ivoire
³ ESATIC, Côte d’Ivoire
⁴ ESATIC, Côte d’Ivoire

Original language: English

Copyright © 2025 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Early detection of type 2 diabetes is a public health priority due to its high prevalence and the severe complications that may result. However, traditional machine learning approaches face several limitations, particularly in model optimization, handling class imbalance, and ensuring clinical interpretability. In this context, we propose an optimized machine learning approach that combines advanced preprocessing, optimization, and modeling techniques. Our methodology is based on four key components: (i) feature engineering guided by medical knowledge (e.g., Glucose/BMI, Age×BMI), (ii) adaptive class rebalancing using SMOTEENN, (iii) Bayesian hyperparameter optimization with Optuna for XGBoost and MLP (Multilayer Perceptron) models, and (iv) an ensemble stacking strategy integrating Random Forest, XGBoost, and MLP, with logistic regression as the meta-learner. The PIMA Indians and Frankfurt Hospital datasets were used to validate this approach. The results are remarkable: an accuracy of 94.05% on PIMA, 99.27% on Frankfurt, and 99.71% on the merged data, with an AUC reaching 99.99%. SHAP analysis highlights the increased importance of insulin in PIMA and the Age×BMI interaction in Frankfurt, while confirming the stability of universal markers such as glucose and BMI. This approach not only delivers outstanding predictive performance but also provides differentiated interpretability, paving the way for more personalized and equitable predictive medicine.

Author Keywords: Machine Learning, Diabetes, Stacking Ensemble, Bayesian Optimization, Feature Engineering, SHAP, Medical Prediction.

How to Cite this Article

Adlès Francis Kouassi, Tanon Lambert Kadjo, K. Yablé Didier, and Olivier Asseu, “Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation,” International Journal of Innovation and Applied Studies, vol. 46, no. 3, pp. 583–595, September 2025.

About IJIAS

News

Submission

Downloads

Archives

Custom Search

Contact

Connect with IJIAS

Explainable Diabetes Risk Assessment Using Optimized Stacked Machine Learning: XGBoost-MLP-Random Forest Ensemble with Cross-Cohort Validation

Abstract

How to Cite this Article