Prediction of Type 2 Diabetes using Metagenomic Data and Identification of Taxonomic Biomarkers Metagenomik Veriler Kullanılarak Tip 2 Diyabetin Tahminlenmesi ve Taksonomik Biyobelirteçlerin Tanımlanması


Temiz M., Kuzudisli C., Yousef M., Bakir-Gungor B.

32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024, Mersin, Turkey, 15 - 18 May 2024 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu61531.2024.10600811
  • City: Mersin
  • Country: Turkey
  • Keywords: biomarker, disease prediction, machine learning, metagenomics, type 2 diabetes
  • Abdullah Gül University Affiliated: Yes

Abstract

Nowadays, different molecular levels of -omics data on diseases are generated and analyzing these data with machine learning methods is one of the popular research topics. Among these data, the use of metagenomic data to facilitate the diagnosis, detection and treatment of diseases is increasing day by day. Type 2 diabetes (T2D) is a chronic disease characterized by insulin resistance and progressive dysfunction of pancreatic beta cells. While the number of people with diabetes is increasing by around 8% annually, the cost of treating the disease is rising by 18% per year. Therefore, the number of studies on the diagnosis, development and progression of T2D is increasing over time. The aim of this study is to achieve higher machine learning performance by using fewer metagenomic features and to achieve better classification performance by reducing computational costs. In this study, we compare the performance of three different methods using T2D-related metagenomic data. First, the MetaPhlAn tool is used to calculate the taxonomic species and their relative abundances in each sample. The SVM-RCE, RCE-IFE and microBiomeGSM tools used in this study are methods that perform classification by grouping and scoring features and are known to work well on complex datasets. In this study, the best results were obtained with the RCE-IFE tool with an AUC of 0.72 with an average of 125 features information. In addition, key taxonomic species identified by these tools as associated with T2D are presented in comparison to the literature.