Abstract
Ground level ozone (O3) concentrations have shown a persistent upward trend in urban megacities over the past decade, despite substantial reductions in primary pollutant emissions through regulatory and technological interventions. Unlike primary pollutants, O3 is produced via nonlinear photochemical reactions that are highly sensitive to meteorological and environmental conditions, posing challenges for control through traditional precursor-based strategies. Here, we present a comprehensive O3 prediction framework tailored to megacity environments that integrates meteorological variables, air pollutant concentrations, and traffic volume, using an eight-year, hourly dataset from 37 administrative districts in Seoul, South Korea. Eight machine learning algorithms were evaluated with rigorous hyperparameter tuning, identifying CatBoost as the most accurate model (R2 = 0.93). To address seasonal variability, separate models were developed for each season, reducing prediction errors and revealing distinct seasonal performance patterns. The winter model achieved the highest accuracy, with an R2 of 0.96 and the lowest RMSE, while the summer model showed relatively reduced performance, due to the complex and volatile nature of warm season photochemistry. Feature importance analysis indicated that temperature and NO2 were key predictors in spring and summer, whereas wind speed, CO and SO2 had greater influence in winter. These findings highlight the limitations of year-round static approaches and support the adoption of seasonally adaptive strategies that account for both meteorological and anthropogenic factors.