Universitas Syiah Kuala | ELECTRONIC THESES AND DISSERTATION

Electronic Theses and Dissertation

Universitas Syiah Kuala

    SKRIPSI
Iftahul Fadhlan, PREDIKSI PENYAKITJANTUNG DENGAN TEKNIK ENSEMBLE BERBASIS RANDOM FOREST DAN CATBOOST. Banda Aceh Fakultas mipa,2025

Penyakit jantung merupakan salah satu penyebab kematian tertinggi di dunia dan menjadi isu penting dalam bidang kesehatan global. deteksi dini penyakit ini menjadi sangat krusial untuk mencegah komplikasi yang lebih serius. oleh karena itu, diperlukan sistem prediksi yang akurat untuk mendeteksi potensi penyakit jantung. penelitian ini bertujuan untuk membangun model prediksi penyakit jantung dengan menggunakan teknik ensemble yang menggabungkan algoritma random forest dan catboost. metode penelitian mencakup pengumpulan data dari sumber terbuka, pemrosesan data, exploratory data analysis (eda), pembagian data, serta penyeimbangan data menggunakan metode synthetic minority over-sampling technique (smote). model dilatih dengan pendekatan individual dan ensemble, kemudian dievaluasi berdasarkan akurasi dan metrik performa lainnya. penelitian ini mengevaluasi kinerja berbagai model prediksi penyakit jantung, termasuk random forest, catboost, dan model ensemble, dengan membandingkan akurasi pada data pelatihan dan pengujian. hasil evaluasi menunjukkan bahwa model random forest memberikan performa terbaik setelah dilakukan seleksi fitur menggunakan feature importance dan pelatihan dengan konfigurasi: n_estimators = 500, max_depth = 25, min_sample_split = 2, min_samples_leaf = 2, max_features = log2, class_weight = balanced, random_state = 42. meskipun model catboost dan model ensemble juga menunjukkan hasil yang kompetitif, random forest tetap unggul dalam hal akurasi dan stabilitas performa. kata kunci :penyakit jantung, random forest, catboost, ensemble.



Abstract

Heart disease is one of the leading causes of death worldwide and remains a major global health issue. Early detection is crucial to prevent more serious complications and improve patient outcomes. This study aims to develop an accurate heart disease prediction model using an ensemble technique that combines the Random Forest and CatBoost algorithms. The research methodology includes data collection from open sources, data preprocessing, exploratory data analysis (EDA), data splitting, and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). The models were trained using both individual and ensemble approaches, then evaluated based on accuracy and other performance metrics. The results show that the Random Forest model achieved the best performance after feature selection using feature importance, and was trained with the configuration: n_estimators = 600, max_depth = 20, min_samples_split = 3, min_samples_leaf = 3, max_features = log2, class_weight = balanced, and random_state = 42. Although the CatBoost and ensemble models also delivered competitive results, the Random Forest model remained superior in terms of accuracy and performance stability. Heart disease is one of the leading causes of death worldwide and remains a major global health issue. Early detection is crucial to prevent more serious complications and improve patient outcomes. This study aims to develop an accurate heart disease prediction model using an ensemble technique that combines the Random Forest and CatBoost algorithms. The research methodology includes data collection from open sources, data preprocessing, exploratory data analysis (EDA), data splitting, and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). The models were trained using both individual and ensemble approaches, then evaluated based on accuracy and other performance metrics. The results show that the Random Forest model achieved the best performance after feature selection using feature importance, and was trained with the configuration: n_estimators = 600, max_depth = 20, min_samples_split = 3, min_samples_leaf = 3, max_features = log2, class_weight = balanced, and random_state = 42. Although the CatBoost and ensemble models also delivered competitive results, the Random Forest model remained superior in terms of accuracy and performance stability. Keywords:heart disease, random forest, catboost, ensemble.



    SERVICES DESK