Fenomena konten negatif dalam komunitas game online di platform seperti youtube menjadi isu serius karena dampaknya terhadap perilaku dan kesehatan mental pengguna. penelitian ini bertujuan membangun model klasifikasi untuk mendeteksi dan memahami konten negatif berdasarkan transkrip audio dari video youtube berbahasa indonesia. transkripsi dilakukan secara otomatis menggunakan model faster-whisper large-v3, diikuti pelabelan otomatis dengan bantuan gpt dan validasi manual. data mencakup lima kategori, yaitu hate, offensive, racism, sexism, dan netral. ketidakseimbangan distribusi data diatasi melalui teknik augmentasi back-translation. dua model deep learning, yaitu indobert dan textcnn, dilatih menggunakan data hasil preprocessing dan dievaluasi berdasarkan akurasi, precision, recall, dan f1-score. model indobert menunjukkan performa terbaik dengan akurasi 89,99%, precision 79,50%, recall 56,82%, dan f1-score 63,74%. sebaliknya, textcnn mencatat akurasi 87,93%, precision 74,93%, recall 52,72%, dan f1-score 58,58%. temuan ini menunjukkan bahwa penambahan data atau teknik augmentasi tidak selalu meningkatkan performa model. model terbaik diintegrasikan ke dalam antarmuka web berbasis streamlit untuk analisis otomatis video youtube. visualisasi berbasis shap juga ditambahkan guna menunjukkan kontribusi kata terhadap hasil klasifikasi. penelitian ini menunjukkan bahwa pendekatan berbasis transkripsi audio dan deep learning dapat menjadi solusi potensial dalam mengidentifikasi serta memahami konten negatif di komunitas game online indonesia.
Electronic Theses and Dissertation
Universitas Syiah Kuala
SKRIPSI
DETEKSI DAN VISUALISASI KONTEN NEGATIF MELALUI PEMROSESAN AUDIO DI YOUTUBE: PERBANDINGAN INDOBERT DAN TEXTCNN DALAM KOMUNITAS GAME ONLINE INDONESIA. Banda Aceh Fakultas MIPA (S1),2025
Baca Juga : IMPLEMENTASI ROBERTA DAN INDOBERT DALAM ANALISIS SENTIMEN ISU PERKEMBANGAN EKONOMI INDONESIA DARI PLATFORM X DAN ARTIKEL BERITA ONLINE (Yoan Rifqi Candra, 2024)
Abstract
Negative content within online gaming communities on platforms such as YouTube has become a serious issue due to its impact on user behavior and mental health. This study aims to build a classification model to detect and understand negative content based on audio transcripts from Indonesian-language YouTube videos. Transcription was performed automatically using the Faster-Whisper Large-v3 model, followed by automated labeling assisted by GPT and manual validation. The data consists of five categories: hate, offensive, racism, sexism, and neutral. Data imbalance was addressed using back-translation augmentation techniques. Two deep learning models, IndoBERT and TextCNN, were trained on the preprocessed data and evaluated based on accuracy, precision, recall, and F1-score. IndoBERT achieved the best performance with an accuracy of 89.99%, precision of 79.50%, recall of 56.82%, and F1-score of 63.74%. In comparison, TextCNN recorded an accuracy of 87.93%, precision of 74.93%, recall of 52.72%, and F1-score of 58.58%. These findings indicate that data augmentation does not always lead to improved model performance. The best-performing model was integrated into a Streamlit-based web interface for automated analysis of YouTube videos. SHAP-based visualization was also implemented to show the contribution of words to the classification results. This study demonstrates that an audio transcription and deep learning-based approach can serve as a potential solution for identifying and understanding negative content in Indonesian online gaming communities.
Baca Juga : PENGARUH INTENSITAS MENONTON KONTEN CHANNEL YOUTUBE SATU PERSEN TERHADAP LITERASI KESEHATAN MENTAL MAHASISWA UNIVERSITAS SYIAH KUALA (Rossdita Amallya, 2024)