A Data-Driven Framework for Predicting Flight Arrival Delays Using Integrated Aviation and Meteorological Data


Akyuz M. S., Bakal M. G.

2025 10th International Conference on Computer Science and Engineering (UBMK), İstanbul, Turkey, 17 - 21 September 2025, pp.1543-1547, (Full Text)

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/ubmk67458.2025.11206792
  • City: İstanbul
  • Country: Turkey
  • Page Numbers: pp.1543-1547
  • Abdullah Gül University Affiliated: Yes

Abstract

Accurate prediction of flight arrival delays is critical for enhancing operational efficiency and customer satisfaction in the aviation industry. This paper presents a robust, data-driven framework for high-precision flight delay prediction by integrating disparate data sources. Our methodology fuses U.S. Department of Transportation (DOT) onboard flight records with spatiotemporal meteorological data from the Climate Data Store (CDS). We perform extensive feature engineering by creating novel variables such as NetDelayImpact and DelayEfficiency and implement cyclical transformations for temporal features to capture periodic patterns. We handle high-cardinality categorical features like city names by using K-Fold target encoding. Hence, we conducted a comparative analysis on several gradient boosting models, including Light-GBM, CatBoost, and HistGradientBoostingRegressor. Through rigorous hyperparameter optimization using Optuna, our final LightGBM model, utilizing a set of 34 engineered features, achieved a Root Mean Squared Error (RMSE) of 2.172 minutes and a Mean Absolute Error (MAE) of 0.118 minutes on the test set. This result significantly outperforms the optimized HistGradientBoostingRegressor, which yielded an RMSE of 7.904 minutes. Finally, model interpretability analysis using SHAP confirms that the most significant predictors are engineered features related to departure and net delay. The proposed framework demonstrates a highly effective and practical approach to minimizing prediction error in flight delay forecasting.