Universitas Syiah Kuala | ELECTRONIC THESES AND DISSERTATION

Electronic Theses and Dissertation

Universitas Syiah Kuala

    SKRIPSI
Reza Fahrevi, ANALISIS SENTIMEN BERBASIS LEXICON BASED DENGAN ALGORITMA NAIVE BAYES TERHADAP KOMENTAR NETIZEN PADA VIDEO YOUTUBE DEBAT CAPRES/CAWAPRES DALAM PEMILU 2024. Banda Aceh ,

Abstrak - penelitian ini bertujuan untuk menganalisis sentimen netizen indonesia terhadap kandidat presiden dan wakil presiden berdasarkan komentar pada video debat di channel youtube kpu. ketidakpastian mengenai bagaimana publik secara luas memandang para calon presiden (capres) dan calon wakil presiden (cawapres) menjadi alasan utama dilakukannya penelitian ini. banyaknya komentar pro dan kontra pada video debat tersebut menunjukkan tingginya ketertarikan publik yang berpotensi mempengaruhi pilihan mereka terhadap capres dan cawapres. data dikumpulkan dari lima debat yang berlangsung antara 12 desember 2023 hingga 10 februari 2024, dengan total 15.027 komentar setelah preprocessing. teknik analisis sentimen ini menggunakan metode lexicon-based dan algoritma naive bayes menunjukkan bahwa distribusi sentimen anies baswedan memiliki rata-rata sentimen positif tertinggi di antara capres (50,9%), diikuti oleh prabowo subianto (31,8%) dan ganjar pranowo (17,4%). sentimen negatif tertinggi ditemukan pada anies (45,1%) disusul prabowo (42,6%), dan ganjar (12,3%). sentimen netral paling banyak ada pada prabowo (70,4%), dibandingkan anies (24,7%) dan ganjar (4,9%). sedangkan cawapres, gibran rakabuming memiliki rata-rata sentimen positif tertinggi (51,6%), diikuti muhaimin iskandar (25,9%) dan mahfud md (22,6%). gibran juga mendominasi sentimen negatif (56,3%), sementara muhaimin dan mahfud mencatat masing-masing 22,6% dan 21,1%. sentimen netral tertinggi ada pada gibran (81,2%), dibandingkan muhaimin (11,9%) dan mahfud (6,8%). berdasarkan model naive bayes dengan fitur tf-idf menunjukkan hasil performa yang stabil dengan akurasi 70,96% pada data validasi dan 70,36% pada data uji. presisi mencapai 75,60% pada data validasi dan 72,33% pada data uji. nilai f1 score berada di kisaran 67,50% hingga 67,73%, menunjukkan keseimbangan yang baik antara presisi dan recall. keyword : analisis sentimen, debat, crawling, preprocessing, lexicon based, naive bayes, tf-idf, evaluasi model, visualisasi data.



Abstract

Abstract - This research aims to analyze the sentiment of Indonesian netizens toward Presidential and Vice-Presidential candidates based on comments on the debate videos on the KPU YouTube channel. Uncertainty regarding how the public perceives the presidential (Capres) and vice-presidential (Cawapres) candidates is the primary reason for conducting this research. The numerous pro and con comments on the debate videos indicate high public interest, which has the potential to influence their choices regarding the candidates. Data were collected from five debates held between December 12, 2023, and February 10, 2024, totaling 15,027 comments after preprocessing. This sentiment analysis technique uses a lexicon-based approach and the Naive Bayes algorithm, showing that the sentiment distribution for Anies Baswedan has the highest average positive sentiment among the presidential candidates (50.9%), followed by Prabowo Subianto (31.8%) and Ganjar Pranowo (17.4%). The highest negative sentiment was found for Anies (45.1%), followed by Prabowo (42.6%) and Ganjar (12.3%). The most neutral sentiment was seen for Prabowo (70.4%), compared to Anies (24.7%) and Ganjar (4.9%). Among the vice-presidential candidates, Gibran Rakabuming had the highest average positive sentiment (51.6%), followed by Muhaimin Iskandar (25.9%) and Mahfud MD (22.6%). Gibran also dominated negative sentiment (56.3%), while Muhaimin and Mahfud recorded 22.6% and 21.1%, respectively. The highest neutral sentiment was for Gibran (81.2%), compared to Muhaimin (11.9%) and Mahfud (6.8%). Based on the Naive Bayes model with TF-IDF features, the results show stable performance with an accuracy of 70.96% on validation data and 70.36% on test data. Precision reached 75.60% on validation data and 72.33% on test data. The F1 Score ranged from 67.50% to 67.73%, indicating a good balance between precision and recall. Keywords: Sentiment Analysis, Debate, Crawling, Preprocessing, Lexicon-Based, Naive Bayes, TF-IDF, Model Evaluation, Data Visualization.



    SERVICES DESK