SEMANT - Feature Group Selection Utilizing FastText-Based Semantic Word Grouping, Scoring, and Modeling Approach for Text Classification


Voskergian D., Bakir-Gungor B., Yousef M.

35th International Conference on Database and Expert Systems Applications, DEXA 2024, Naples, Italy, 26 - 28 August 2024, vol.14911 LNCS, pp.69-75 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 14911 LNCS
  • Doi Number: 10.1007/978-3-031-68312-1_5
  • City: Naples
  • Country: Italy
  • Page Numbers: pp.69-75
  • Keywords: Feature Group Selection, Feature Grouping, Hybrid Feature Selection, Machine Learning, Semantics, Text Classification, Word Embedding
  • Abdullah Gül University Affiliated: Yes

Abstract

Text classification presents a challenge due to its high-dimensional feature space. As such, devising an effective feature selection scheme is essential. In this study, we present SEMANT, a novel hybrid filter-wrapper feature selection method that utilizes filter-based Chi-Square and the wrapper-based G-S-M approach. SEMANT incorporates fastText neural word embedding similarities to promote greater semantic inclusion in the selection of features for text classification tasks. The performance of the proposed method was investigated on the WOS-5736 and LitCovid datasets and compared with TextNetTopics, a topic modeling-based topic selection algorithm for text classification. Experimental results confirm that the proposed approach outperforms its alternative.