Metin Sınıflandırmada N-gramlar ile Grafik Evrişiminin Bütünleşik Kullanımı


Şen T. Ü., Yakit M. C., Gümüş M. S., Abar O., Bakal M. G.

Applied Soft Computing, cilt.175, ss.113092, 2025 (SCI-Expanded)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 175
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.asoc.2025.113092
  • Dergi Adı: Applied Soft Computing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC
  • Sayfa Sayıları: ss.113092
  • Abdullah Gül Üniversitesi Adresli: Evet

Özet

Text classification, a cornerstone of natural language processing (NLP), finds applications in diverse areas, from sentiment analysis to topic categorization. While deep learning models have recently dominated the field, traditional n-gram-driven approaches often struggle to achieve comparable performance, particularly on large datasets. This gap largely stems from deep learning’ s superior ability to capture contextual information through word embeddings. This paper explores a novel approach to leverage the often-overlooked power of n-gram features for enriching word representations and boosting text classification accuracy. We propose a method that transforms textual data into graph structures, utilizing discriminative n-gram series to establish long-range relationships between words. By training a graph convolution network on these graphs, we derive contextually enhanced word embeddings that encapsulate dependencies extending beyond local contexts. Our experiments demonstrate that integrating these enriched embeddings into an long-short term memory (LSTM) model for text classification leads to around 2% improvements in classification performance across diverse datasets. This achievement highlights the synergy of combining traditional n-gram features with graph-based deep learning techniques for building more powerful text classifiers.