2024 Innovations in Intelligent Systems and Applications Conference, ASYU 2024, Ankara, Turkey, 16 - 18 October 2024
Chronic Granulomatous Disease (CGD) is a rare, inherited immunodeficiency disorder characterized by white blood cells unable to effectively kill certain bacteria and fungi. This defect results in the formation of clusters of immune cells called granulomas that form at sites of infection or inflammation. Therefore, identification of disease-related biomarkers is a critical step in advancing precision medicine and improving diagnostic accuracy. In this study, we applied a G-S-M machine learning approach to metabolomics data to uncover CGD-Associated biomarkers. We obtained a metabolomics dataset from Gene Expression Omnibus with GSE220260 accession number. Data includes 85 samples (16 healthy controls and 69 CGD samples) with comprehensive metabolic profiles obtained using liquid chromatography-mass spectrometry analysis. Dataset includes metabolite names with their ion type and formula. In order to identify CGD related metabolites and their ion types, G-S-M was used as a grouping function when performing machine learning oriented metabolomics data analysis. We have performed the G-S-M approach by grouping metabolites according to their ion type. In the training part of the G-S-M approach, metabolites annotated with selected ion types have been utilized to perform a two-class classification task which generates an important set of ion type output. We also compared the performance results of the G-S-M machine learning model with traditional feature selection methods; XGB, SKB, IG, FCBF, MRMR, CMIM with random forest classifier. 100 times Monte-Carlo Cross Validation was used in our experiments. It was observed that G-S-M, XGB, SKB and FCBF methods similarly provided the best performances. In this study, besides its performance, G-S-M method used groups based on ion types unlike TFS, and then identified relevant Chronic Granulomatous Disease-associated metabolites.