BERT Tabanlı Bilgi Aktarımı Yoluyla Borsa Tweetlerinde Duygu Analizinin Geliştirilmesi


Cicekyurt E., Bakal M. G.

Computational Economics, cilt.65, ss.1-23, 2025 (SCI-Expanded)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 65
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1007/s10614-025-10901-8
  • Dergi Adı: Computational Economics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI)
  • Sayfa Sayıları: ss.1-23
  • Abdullah Gül Üniversitesi Adresli: Evet

Özet

One of the widely studied text classification efforts is sentiment analysis. It is a specific examination involving natural language processing and machine learning methods to understand semantic orientation from textual data. Working social media posts, such as tweets, for sentiment analysis, is quite common among researchers due to the speed of information dissemination. In this regard, forecasting stock market tweets is a widely studied research topic. Some studies have revealed a strong connection between sentiment and stock market performance, while others have not found any notable associations. The proposed work shows two distinct approaches to sentiment analysis over the stock market tweets. The first approach employs traditional machine learning algorithms, including logistic regression, random forest, and XGBoost. The second approach constructs deep learning (as a subfield of machine learning) models using LSTM and CNN algorithms to classify the test instances into positive, negative, or neutral classes through ten randomly shuffled data splits. In this study, the labeled data size is gradually increased utilizing a pre-trained model, FinBERT. It is exclusively employed to label unlabeled data instances to integrate them into the experiments. The goal is to monitor the effect of the additional newly-labeled examples on the sentiment analysis performance. The experiments showed that the average F1-score improved by 20% for the deep learning models and 17% for the machine learning models. In the end, the paper reveals a strong positive correlation between training data size and the classification performance of the experimental approaches.