LSTM mimarisiyle bilgilendirici akciğer röntgeni altyazıları oluşturma


Güzel Ö. F., Tanrıverdi H., Bakal M. G.

Journal of Innovative Engineering and Natural Science, cilt.5, sa.2, ss.477-489, 2025 (Hakemli Dergi)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 5 Sayı: 2
  • Basım Tarihi: 2025
  • Doi Numarası: 10.61112/jiens.1529215
  • Dergi Adı: Journal of Innovative Engineering and Natural Science
  • Derginin Tarandığı İndeksler: TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.477-489
  • Abdullah Gül Üniversitesi Adresli: Evet

Özet

Biomedical imaging is the most effective medical screening procedure for medical specialists. Specifically, X-ray images are intensively used as a reference point for medical diagnostic purposes. However, understanding the underlying matters from the X-ray images requires significant radiological knowledge. In this study, a deep learning model, which employs the DenseNet121 neural network architecture as an encoder module and textual data (captions) items as word embedding layers, is trained to predict the corresponding title/caption information of the given X-ray images. The generated model is a typical sequence-to-sequence model used particularly for neural machine translation tasks. In the experiments, the Open-i database curated by Indiana University is used for the training and testing phases. The dataset consists of 7,470 X-ray images and 3,955 patient reports stored in XML format, composed by a domain expert. The textual reports contain four specific captions, including impressions, findings, comparisons, and indications. During the model development, the textual data under the impression captions was exploited in the training and testing steps. To measure the model’s performance, the Bilingual Evaluation Understudy Score (BLUE) was calculated and utilized as the primary performance evaluation metric. Based on the BLUE scores, the best performance score was achieved when four words (four grams) were predicted with the BLUE score of 0.38368 compared to other n-gram sets (where n: 1, 2, and 3). This research effort demonstrates the power of sequence-to-sequence models on the text generation task in medical image datasets for automatic diagnosing purposes.