Electronic Theses and Dissertation

Universitas Syiah Kuala

DISSERTATION

MODEL PEMAHAMAN BAHASA INDONESIA BERBASIS TRANSFORMERS

Pengarang

Dosen Pembimbing

Taufik Fuadi Abidin - 197010081994031002 - Dosen Pembimbing I
Hammam Riza - 196208081987111001 - Dosen Pembimbing II
Kahlil Muchtar - - - Dosen Pembimbing III

Nomor Pokok Mahasiswa

2109300070005

Fakultas & Prodi

Fakultas Pasca Sarjana / Doktor Matematika dan Aplikasi Sains (S3) / PDDIKTI : 44001

Subject

Kata Kunci

Penerbit

Banda Aceh : Fakultas Pasca Sarjana / Prodi Doktor Matematika dan Aplikasi Sains (S3)., 2025

Bahasa

No Classification

Literature Searching Service

Hard copy atau foto copy dari buku ini dapat diberikan dengan syarat ketentuan berlaku, jika berminat, silahkan hubungi via telegram (Chat Services LSS)

Berbagai metode penyisipan kata dan model pralatih (pretrained) telah banyak dieksplorasi untuk mengatasi berbagai permasalahan dalam natural language processing (NLP). Dalam beberapa tahun terakhir, model transformers menjadi pilihan utama untuk model pralatih, terutama pada bahasa dengan sumber data yang lengkap seperti Bahasa Inggris, Mandarin dan Arab. Namun, penelitian terhadap model pretrained dengan sumber data Bahasa Indonesia masih relatif terbatas. Beberapa upaya telah dilakukan untuk mengembangkan model pralatih menggunakan kumpulan data (dataset) Bahasa Indonesia, salah satunya melalui Indonesian natural language understanding (IndoNLU). Pada IndoNLU, berbagai tugas tertentu (tasks) NLP seperti klasifikasi emosi, analisis sentimen, keterkaitan tekstual, dan pengenalan entitas masih menyimpan peluang signifikan untuk peningkatan kinerja model. Penelitian ini bertujuan untuk meningkatkan kinerja tugas tertentu NLP Bahasa Indonesia pada klasifikasi emosi, analisis sentimen, keterkaitan tekstual, dan pengenalan entitas dengan memanfaatkan varian model bidirectional encoder representations from Transformer (BERT) yang telah dikembangkan secara khusus untuk Bahasa Indonesia, yang dikenal sebagai IndoBERT. Pendekatan model dalam penelitian ini mengadopsi metode penggabungan (hybrid) antara model modifikasi IndoBERT dengan teknik sum of 4 last layers dengan beberapa model jaringan saraf tiruan (neural networks). Kinerja model yang dihasilkan dinilai menggunakan metrik F1-score. Pada penelitian pertama mengkaji pengujian pada klasifikasi emosi dan analisis sentimen dengan pendekatan model penggabungan antara model IndoBERT dengan memodifikasi lapisan (layer) dengan teknik menjumlahkan mulai dua sampai lima lapisan terakhir. Masing-masing modifikasi lapisan digabung dengan model bidirectional long short-term memory (BiLSTM) dan dilakukan pengujian untuk klasifikasi emosi dan analisis sentimen. Hasil eksperimen menunjukkan bahwa model penggabungan antara model IndoBERT menggunakan modifikasi lapisan dengan model BiLSTM berhasil mencapai nilai kinerja sebesar 0,92 untuk analisis sentimen dan 0,76 untuk klasifikasi emosi. Penelitian kedua adalah melakukan pengujian kembali pada klasifikasi emosi dan analisis sentimen dengan pendekatan model penggabungan antara model IndoBERT menggunakan modifikasi sum of 4 last layers dengan model BiLSTM, bidirectional gated recurrent unit (BiGRU), dan lapisan attention. Hasil pengujian menunjukkan bahwa model penggabungan antara model IndoBERT (sum of 4 last layers) dan model BiLSTM mencapai nilai kinerja sebesar 0.93 untuk analisis sentimen dan 0.78 untuk klasifikasi emosi. Penelitian ketiga adalah melakukan pengujian pada keterkaitan tekstual dengan pendekatan model penggabungan antara model IndoBERT (sum of 4 last layers) dengan model jaringan saraf tiruan. Hasil eksperimen menunjukkan nilai kinerja sebesar 0,87 untuk model penggabungan antara model IndoBERT (sum of 4 last layers) dengan model BiGRU dan sebesar 0,84 untuk model IndoBERT (sum of 4 last layers) dengan model BiLSTM. Penelitian terakhir adalah pengujian pada pengenalan entitas menggunakan pendekatan model penggabungan antara model IndoBERT (sum of 4 last layers) dengan BiLSTM dan BiGRU. Hasil evaluasi model untuk dataset NERGRiT menunjukkan nilai kinerja 0.79 untuk model penggabungan antara model IndoBERT (sum of 4 last layers) dengan model BiLSTM. Kesimpulan akhir penelitian ini adalah model penggabungan yang mengintegrasikan model IndoBERT (sum of 4 last layers) bersama BiLSTM dan BiGRU terbukti mampu meningkatkan kinerja model dalam berbagai tugas tertentu NLP pada IndoNLU. Evaluasi menunjukkan bahwa pendekatan ini memiliki potensi besar untuk meningkatkan kinerja pada tugas tertentu NLP, meskipun hasilnya dapat berbeda bergantung pada jenis dataset yang digunakan.

Abstrak Inggris

Various pre-trained models have been explored to address multiple challenges in natural language processing (NLP). In recent years, the Transformers model has become the leading choice for pre-trained models, particularly in languages with comprehensive data sources, such as English, Chinese, and Arabic. However, research on pre-trained models using Indonesian data sources remains limited. Several efforts have been made to develop pre-trained models using Indonesian language datasets, including the IndoNLU benchmark. In the IndoNLU benchmark, various NLP tasks - such as emotion classification, sentiment analysis, textual linkage, and entity recognition - still present significant opportunities for improving model performance. This research aims to improve the performance of Indonesian NLP benchmarks in the tasks of emotion classification, sentiment analysis, textual entailment, and named entity recognition by utilizing a variant of the bidirectional encoder representations from the Transformer (BERT) model, specifically developed for Indonesian, known as IndoBERT. The approach in this research adopts a hybrid method that combines the IndoBERT-modified model (with the sum of the four last layers technique) and several neural network models. The performance of the resulting model is assessed using the F1-score metric. The initial phase of the research involved testing emotion classification tasks and sentiment analysis with a hybrid model approach, using the IndoBERT model with layer modifications achieved by summing the last two to five layers. Each modified layer was combined with the bidirectional long short-term memory (BiLSTM) model and tested on emotion classification and sentiment analysis tasks. The experimental results show that the hybrid model, which combines the modified IndoBERT model with the BiLSTM model, achieved the F1-score of 0.92 for sentiment analysis tasks and 0.76 for emotion classification tasks. The second phase of the research tests emotion classification and sentiment analysis tasks using a hybrid approach that combines the IndoBERT model (with the sum of the last four layers) with the BiLSTM model, bidirectional gated recurrent unit (BiGRU), and attention layers. The test results show that the hybrid model combining the IndoBERT model (sum of the last four layers) and the BiLSTM model achieves the F1-score of 0.93 for sentiment analysis tasks and 0.78 for emotion classification tasks. The third phase of the research tests textual entailment tasks using a hybrid approach between the IndoBERT model (sum of the last four layers) and various neural network models. The experimental results show the F1-score of 0.87 for the hybrid model combining the IndoBERT model (sum of the last four layers) with the BiGRU model, and the F1-score of 0.84 for the hybrid model combining the IndoBERT model (sum of the last four layers) with the BiLSTM model. The subsequent phase focuses on fine-tuning entity recognition tasks using a hybrid approach combining the IndoBERT model (sum of the last four layers) with BiLSTM and BiGRU. The model evaluation results for the NERGRiT dataset show the F1-score of 0.79 for the hybrid model combining the IndoBERT model (sum of the last four layers) with the BiLSTM model. This research concludes that the hybrid model, which integrates the IndoBERT (sum of the last four layers) model with BiLSTM and BiGRU, has improved accuracy across various NLP tasks in the IndoNLU benchmark. The evaluation indicates that this approach holds significant potential for improving performance on specific NLP tasks, although the results may vary depending on the type of dataset applied.

Tulisan Relevan

MEMBANGUN MODEL MESIN PENERJEMAH BAHASA INDONESIA KE BAHASA ACEH MENGGUNAKAN NEURAL MACHINE TRANSLATION (Muhammad Arief Hidayah, 2026)

PENGEMBANGAN MODEL BERT DAN HIBRID UNTUK ANALISIS SENTIMEN DENGAN ACEHX FINE-TUNING DAN PENYESUAIAN TOKENIZER (Doni Sumito Sukiswo, 2026)

PERBANDINGAN PERFORMA METODE CNN DAN INDOBERT UNTUK KLASIFIKASI JUDUL BERITA DALAM BAHASA INDONESIA YANG HOAKS DAN TERPERCAYA (NUR ULFAH ATIQAH, 2024)

PERBANDINGAN MODEL MOBILENETV2 DAN MEDIAPIPE CNN DALAM MENGENALI ABJAD BAHASA ISYARAT INDONESIA (BISINDO) (Ivan Chiari, 2025)

PENINGKATAN PEMAHAMAN MATEMATIS DAN KONSEP DIRI SISWA MELALUI MODEL DISCOVERY LEARNING BERBASIS AKSI, PROSES, OBJEK DAN SKEMA (Handa Maya Sari, 2023)

APA Citation Style

Ahmadian, Hendri .(2025). MODEL PEMAHAMAN BAHASA INDONESIA BERBASIS TRANSFORMERS. Banda Aceh: Fakultas Pasca Sarjana / Prodi Doktor Matematika dan Aplikasi Sains (S3).

Chicago/Turabian Citation Style

Ahmadian, Hendri . MODEL PEMAHAMAN BAHASA INDONESIA BERBASIS TRANSFORMERS. Banda Aceh: Fakultas Pasca Sarjana / Prodi Doktor Matematika dan Aplikasi Sains (S3), 2025.

MLA Citation Style

Ahmadian, Hendri . MODEL PEMAHAMAN BAHASA INDONESIA BERBASIS TRANSFORMERS. Banda Aceh: Fakultas Pasca Sarjana / Prodi Doktor Matematika dan Aplikasi Sains (S3), 2025. Print