Electronic Theses and Dissertation

Universitas Syiah Kuala

SKRIPSI

DETEKSI KOMENTAR SPAM PADA YOUTUBE MENGGUNAKAN ENSEMBLE MACHINE LEARNING

Pengarang

Ahmad Faqih Al Ghiffary - Personal Name;

Dosen Pembimbing

Alim Misbullah - 198806032019031011 - Dosen Pembimbing I
Rasudin - 197410011999031001 - Dosen Pembimbing II

Nomor Pokok Mahasiswa

2108107010072

Fakultas & Prodi

Fakultas MIPA / Informatika (S1) / PDDIKTI : 55201

Subject

Kata Kunci

Penerbit

Banda Aceh : Fakultas mipa., 2025

Bahasa

No Classification

Literature Searching Service

Hard copy atau foto copy dari buku ini dapat diberikan dengan syarat ketentuan berlaku, jika berminat, silahkan hubungi via telegram (Chat Services LSS)

YouTube merupakan platform berbagi video terbesar di dunia yang memungkinkan pengguna berinteraksi melalui komentar, namun munculnya spam komentar mengganggu pengalaman pengguna dan kredibilitas kreator. Penelitian ini bertujuan untuk mendeteksi spam komentar di YouTube dengan pendekatan Ensemble Machine Learning. Data komentar dikumpulkan melalui YouTube API dan diproses melalui tahapan case folding, stemming, stopwords removal, dan tokenization. Penelitian ini menggunakan model individual berupa Naive Bayes, Logistic Regression, Support Vector Machine, Decision Tree, dan Random Forest, serta model ensemble yang meliputi Hard Ensemble, Soft Ensemble, dan Weighted Ensemble. Evaluasi model dilakukan dengan metrik F1-score, precision, recall, dan Mathew Correlation Coefficient (MCC), yang menunjukkan bahwa Hard Ensemble memperoleh F1-score sebesar 90,59% dengan MCC 0,892, Soft Ensemble mencapai F1-score 90,30% dengan MCC 0,888, dan Weighted Ensemble mendapatkan F1-score 90,12% dengan MCC 0,886. Sementara itu, model individual menunjukkan performa yang lebih rendah, dengan Support Vector Machine memperoleh F1-score 88,48% dan MCC 0,869, Random Forest mencapai F1-score 87,23% dan MCC 0,853, Decision Tree memperoleh F1-score 86,22% dengan MCC 0,842, Logistic Regression mencapai F1-score 84,72% dan MCC 0,824, serta Naive Bayes hanya mencapai F1-score 63,67% dengan MCC 0,599. Temuan ini mengindikasikan bahwa metode ensemble secara signifikan meningkatkan akurasi dalam mendeteksi spam komentar di YouTube, sehingga diharapkan dapat meningkatkan kualitas interaksi pengguna dan membantu mengurangi penyebaran spam.

Abstrak Inggris

YouTube is the world’s largest video-sharing platform that enables users to interact through comments; however, the prevalence of spam comments disrupts user experience and undermines content creator credibility. This research aims to detect spam comments on YouTube using an Ensemble Machine Learning approach. Comments data were collected via the YouTube API and processed through steps including case folding, stemming, stopwords removal, and tokenization. The study employed individual models such as Naive Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and Random Forest, as well as ensemble models including Hard Ensemble, Soft Ensemble, and Weighted Ensemble. The models were evaluated using metrics such as F1-score, precision, recall, and MCC, revealing that the Hard Ensemble achieved an F1-score of 90.59% with an MCC of 0.892, the Soft Ensemble reached an F1-score of 90.30% with an MCC of 0.888, and the Weighted Ensemble obtained an F1-score of 90.12% with an MCC of 0.886. In contrast, the individual models performed lower, with the Support Vector Machine achieving an F1-score of 88.48% and MCC 0.869, Random Forest attaining an F1-score of 87.23% and MCC 0.853, Decision Tree recording an F1-score of 86.22% with MCC 0.842, Logistic Regression reaching an F1-score of 84.72% and MCC 0.824, and Naive Bayes only achieving an F1-score of 63.67% with MCC 0.599. These result suggest that ensemble methods significantly enhance the accuracy of spam comment detection on YouTube, which can improve user interaction quality and help reduce spam propagation.