PREDIKSI PENYAKITJANTUNG DENGAN TEKNIK ENSEMBLE BERBASIS RANDOM FOREST DAN CATBOOST | ELECTRONIC THESES AND DISSERTATION

Electronic Theses and Dissertation

Universitas Syiah Kuala

    SKRIPSI

PREDIKSI PENYAKITJANTUNG DENGAN TEKNIK ENSEMBLE BERBASIS RANDOM FOREST DAN CATBOOST


Pengarang

Iftahul Fadhlan - Personal Name;

Dosen Pembimbing

Zahnur - 196905291994031002 - Dosen Pembimbing I
Mahyus Ihsan - 197010051998021001 - Dosen Pembimbing II



Nomor Pokok Mahasiswa

2008107010024

Fakultas & Prodi

Fakultas MIPA / Informatika (S1) / PDDIKTI : 55201

Penerbit

Banda Aceh : Fakultas mipa., 2025

Bahasa

Indonesia

No Classification

005.1

Literature Searching Service

Hard copy atau foto copy dari buku ini dapat diberikan dengan syarat ketentuan berlaku, jika berminat, silahkan hubungi via telegram (Chat Services LSS)

Penyakit jantung merupakan salah satu penyebab kematian tertinggi di dunia
dan menjadi isu penting dalam bidang kesehatan global. Deteksi dini penyakit ini
menjadi sangat krusial untuk mencegah komplikasi yang lebih serius. Oleh karena itu,
diperlukan sistem prediksi yang akurat untuk mendeteksi potensi penyakit jantung.
Penelitian ini bertujuan untuk membangun model prediksi penyakit jantung dengan
menggunakan teknik ensemble yang menggabungkan algoritma Random Forest dan
CatBoost. Metode penelitian mencakup pengumpulan data dari sumber terbuka,
pemrosesan data, exploratory data analysis (EDA), pembagian data, serta
penyeimbangan data menggunakan metode Synthetic Minority Over-sampling
Technique (SMOTE). Model dilatih dengan pendekatan individual dan ensemble,
kemudian dievaluasi berdasarkan akurasi dan metrik performa lainnya. Penelitian ini
mengevaluasi kinerja berbagai model prediksi penyakit jantung, termasuk Random
Forest, CatBoost, dan model ensemble, dengan membandingkan akurasi pada data
pelatihan dan pengujian. Hasil evaluasi menunjukkan bahwa model Random Forest
memberikan performa terbaik setelah dilakukan seleksi fitur menggunakan feature
importance dan pelatihan dengan konfigurasi: n_estimators = 500, max_depth = 25,
min_sample_split = 2, min_samples_leaf = 2, max_features = log2, class_weight =
balanced, random_state = 42. Meskipun model CatBoost dan model ensemble juga
menunjukkan hasil yang kompetitif, Random Forest tetap unggul dalam hal akurasi dan
stabilitas performa.
Kata kunci :penyakit jantung, random forest, catboost, ensemble.

Heart disease is one of the leading causes of death worldwide and remains a major global health issue. Early detection is crucial to prevent more serious complications and improve patient outcomes. This study aims to develop an accurate heart disease prediction model using an ensemble technique that combines the Random Forest and CatBoost algorithms. The research methodology includes data collection from open sources, data preprocessing, exploratory data analysis (EDA), data splitting, and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). The models were trained using both individual and ensemble approaches, then evaluated based on accuracy and other performance metrics. The results show that the Random Forest model achieved the best performance after feature selection using feature importance, and was trained with the configuration: n_estimators = 600, max_depth = 20, min_samples_split = 3, min_samples_leaf = 3, max_features = log2, class_weight = balanced, and random_state = 42. Although the CatBoost and ensemble models also delivered competitive results, the Random Forest model remained superior in terms of accuracy and performance stability. Heart disease is one of the leading causes of death worldwide and remains a major global health issue. Early detection is crucial to prevent more serious complications and improve patient outcomes. This study aims to develop an accurate heart disease prediction model using an ensemble technique that combines the Random Forest and CatBoost algorithms. The research methodology includes data collection from open sources, data preprocessing, exploratory data analysis (EDA), data splitting, and class balancing using the Synthetic Minority Over-sampling Technique (SMOTE). The models were trained using both individual and ensemble approaches, then evaluated based on accuracy and other performance metrics. The results show that the Random Forest model achieved the best performance after feature selection using feature importance, and was trained with the configuration: n_estimators = 600, max_depth = 20, min_samples_split = 3, min_samples_leaf = 3, max_features = log2, class_weight = balanced, and random_state = 42. Although the CatBoost and ensemble models also delivered competitive results, the Random Forest model remained superior in terms of accuracy and performance stability. Keywords:heart disease, random forest, catboost, ensemble.

Citation



    SERVICES DESK