Combining N-grams and graph convolution for text classification


Şen T. Ü., Yakit M. C., Gümüş M. S., Abar O., Bakal M. G.

Applied Soft Computing, vol.175, pp.113092, 2025 (SCI-Expanded)

  • Publication Type: Article / Article
  • Volume: 175
  • Publication Date: 2025
  • Doi Number: 10.1016/j.asoc.2025.113092
  • Journal Name: Applied Soft Computing
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC
  • Page Numbers: pp.113092
  • Abdullah Gül University Affiliated: Yes

Abstract

Text classification, a cornerstone of natural language processing (NLP), finds applications in diverse areas, from sentiment analysis to topic categorization. While deep learning models have recently dominated the field, traditional n-gram-driven approaches often struggle to achieve comparable performance, particularly on large datasets. This gap largely stems from deep learning’ s superior ability to capture contextual information through word embeddings. This paper explores a novel approach to leverage the often-overlooked power of n-gram features for enriching word representations and boosting text classification accuracy. We propose a method that transforms textual data into graph structures, utilizing discriminative n-gram series to establish long-range relationships between words. By training a graph convolution network on these graphs, we derive contextually enhanced word embeddings that encapsulate dependencies extending beyond local contexts. Our experiments demonstrate that integrating these enriched embeddings into an long-short term memory (LSTM) model for text classification leads to around 2% improvements in classification performance across diverse datasets. This achievement highlights the synergy of combining traditional n-gram features with graph-based deep learning techniques for building more powerful text classifiers.