MODEL HYBRID MACHINE LEARNING BERBASIS SMOTEENN-SOFT VOTING ENSEMBLE DAN ANALISIS SHAP UNTUK PREDIKSI RISIKO STUNTING | ELECTRONIC THESES AND DISSERTATION

Electronic Theses and Dissertation

Universitas Syiah Kuala

    THESES

MODEL HYBRID MACHINE LEARNING BERBASIS SMOTEENN-SOFT VOTING ENSEMBLE DAN ANALISIS SHAP UNTUK PREDIKSI RISIKO STUNTING


Pengarang

Nuwairy El Furqany - Personal Name;

Dosen Pembimbing

Muhammad Subianto - 196812111994031005 - Dosen Pembimbing I
Asep Rusyana - 197603172006041001 - Dosen Pembimbing II



Nomor Pokok Mahasiswa

2408207010021

Fakultas & Prodi

Fakultas MIPA / Magister Kecerdasan Buatan (S2) / PDDIKTI : 49302

Subject
-
Kata Kunci
-
Penerbit

Banda Aceh : Fakultas MIPA (S2)., 2026

Bahasa

No Classification

-

Literature Searching Service

Hard copy atau foto copy dari buku ini dapat diberikan dengan syarat ketentuan berlaku, jika berminat, silahkan hubungi via telegram (Chat Services LSS)

Stunting merupakan masalah kesehatan global yang berdampak jangka panjang terhadap kualitas sumber daya manusia dan masih menjadi prioritas utama di Indonesia. Di Provinsi Sumatera Barat, prevalensi stunting pada tahun 2024 tercatat sebesar 23,6%, yang masih berada di atas target nasional sebesar 14%. Penelitian ini bertujuan untuk mengembangkan model prediksi risiko stunting berbasis machine learning yang mampu menangani ketidakseimbangan kelas, meningkatkan akurasi dan sensitivitas prediksi, serta menyediakan interpretabilitas model yang transparan. Data penelitian bersumber dari Pemutakhiran Pendataan Keluarga (PK) BKKBN Provinsi Sumatera Barat tahun 2023 dengan total 115.579 keluarga. Tahapan penelitian meliputi pra-pemrosesan data, pembagian data, penyeimbangan kelas pada data latih menggunakan SMOTEENN, optimasi hyperparameter, serta pemodelan klasifikasi menggunakan Logistic Regression, Random Forest, Support Vector Machine, dan XGBoost. Selanjutnya, dikembangkan Soft Voting Ensemble (SVE) dengan pembobotan berbasis akurasi untuk mengintegrasikan keunggulan beberapa model. Evaluasi performa dilakukan menggunakan metrik akurasi, presisi, recall, dan F1-score, sedangkan interpretabilitas dianalisis menggunakan metode SHapley Additive exPlanations (SHAP). Hasil penelitian menunjukkan bahwa penerapan SMOTEENN meningkatkan akurasi dan sensitivitas seluruh model secara konsisten. Peningkatan terbesar terjadi pada Random Forest, sementara XGBoost menunjukkan kinerja yang paling stabil dengan akurasi 91,82% dan recall 91,74%. Model hybrid Soft Voting Ensemble yang mengombinasikan Random Forest dan XGBoost menghasilkan performa terbaik dengan akurasi 91,95% dan sensitivitas 93,21% dalam mendeteksi keluarga berisiko stunting. Analisis SHAP mengidentifikasi jumlah anggota keluarga, tingkat pendidikan, keragaman konsumsi makanan, jenis pekerjaan, dan sumber air minum sebagai faktor paling berpengaruh terhadap risiko stunting. Penelitian ini membuktikan bahwa integrasi SMOTEENN, Soft Voting Ensemble berbobot akurasi, dan SHAP mampu menghasilkan model prediksi risiko stunting yang akurat, sensitif terhadap kelas minoritas, stabil, dan mudah diinterpretasikan, sehingga relevan untuk mendukung kebijakan intervensi stunting berbasis data.

Kata kunci: Stunting, Machine Learning, SMOTEENN, Soft Voting Ensemble, SHAP, Interpretabilitas Model

Stunting is a global public health problem with long-term impacts on human capital quality and remains a major priority in Indonesia. In West Sumatra Province, the stunting prevalence in 2024 reached 23.6%, which is still above the national target of 14%. This study aims to develop a machine learning–based stunting risk prediction model that can address class imbalance, improve predictive accuracy and sensitivity, and provide transparent model interpretability. The dataset used in this study was obtained from the 2023 Family Data Updating Program conducted by the National Population and Family Planning Board (BKKBN) of West Sumatra Province, comprising 115,579 households. The research stages included data preprocessing, data splitting, class balancing on the training set using SMOTEENN, hyperparameter optimization, and classification modeling using Logistic Regression, Random Forest, Support Vector Machine, and XGBoost. Furthermore, a Soft Voting Ensemble (SVE) with accuracy-based weighting was developed to integrate the strengths of multiple classifiers. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics, while interpretability was analyzed using SHapley Additive exPlanations (SHAP). The results show that applying SMOTEENN consistently improved the accuracy and sensitivity of all evaluated models. The largest performance improvement was observed in Random Forest, while XGBoost demonstrated the most stable performance with an accuracy of 91.82% and a recall of 91.74%. The hybrid Soft Voting Ensemble combining Random Forest and XGBoost achieved the best results, with an accuracy of 91.95% and a sensitivity of 93.21% in detecting households at risk of stunting. SHAP analysis identified household size, education level, dietary diversity, occupation type, and drinking water source as the most influential predictors of stunting risk. Overall, this study demonstrates that integrating SMOTEENN, accuracy-weighted Soft Voting Ensemble, and SHAP produces an accurate, sensitive, stable, and interpretable stunting risk prediction model, which is highly relevant for supporting data-driven stunting intervention policies. Keywords: Stunting, Machine Learning, SMOTEENN, Soft Voting Ensemble, SHAP, Model Interpretability

Citation



    SERVICES DESK