Skip Navigation

Publication Detail

Title: Comparison of Machine Learning and Land Use Regression for fine scale spatiotemporal estimation of ambient air pollution: Modeling ozone concentrations across the contiguous United States.

Authors: Ren, Xiang; Mi, Zhongyuan; Georgopoulos, Panos G

Published In Environ Int, (2020 09)

Abstract: BACKGROUND: Spatial linear Land-Use Regression (LUR) is commonly used for long-term modeling of air pollution in support of exposure and epidemiological assessments. Machine Learning (ML) methods in conjunction with spatiotemporal modeling can provide more flexible exposure-relevant metrics and have been studied using different model structures. There is however a lack of comparisons of methods available within these two modeling frameworks, that can guide model/algorithm selection in air quality epidemiology. OBJECTIVE: The present study compares thirteen algorithms for spatial/spatiotemporal modeling applied for daily maxima of 8-hour running averages of ambient ozone concentrations at spatial resolutions corresponding to census tracts, to support estimation of annual ozone design values across the contiguous US. These algorithms were selected from nine representative categories and trained using predictors that included chemistry-transport model predictions, meteorological factors, land use and land cover, and stationary and mobile emissions. METHODS: To obtain the best predictive performance, model structures were optimized through a repeated coarse/fine grid search with expert knowledge. Six target-oriented validation strategies were used to prevent overfitting and avoid over-optimistic model evaluation results. In order to take full advantage of the power of different algorithms, we introduced tuning sample weights in spatiotemporal modeling to ensure predictive accuracy of peak concentrations, that is crucial for exposure assessments. In spatial modeling, four interpretation and visualization tools were introduced to explain predictions from different algorithms. RESULTS: Nonlinear ML methods achieved higher prediction accuracy than linear LUR, and the improvements were more significant for spatiotemporal modeling (nearly 10%-40% decrease of predicted RMSE). By tuning the sample weights, spatiotemporal models can predict concentrations used to calculate ozone design values that are comparable or even better than spatial models (nearly 30% decrease of cross-validated RMSE). We visualized the underlying nonlinear relationships, heterogeneous associations and complex interactions from the two best performing ML algorithms, i.e., Random Forest and Extreme Gradient Boosting, and found that the complex patterns were relatively less significant with respect to model accuracy for spatial modeling. CONCLUSION: Machine Learning can provide estimates that are actually more interpretable and practical than linear regression to improve accuracy in modeling human exposures. A careful design of hyperparameter tuning and flexible data splitting and validations is crucial to obtain reliable and stable results. Desirable/successful nonlinear models are expected to capture similar nonlinear patterns and interactions using different ML algorithms.

PubMed ID: 32593834 Exiting the NIEHS site

MeSH Terms: Air Pollutants*/analysis; Air Pollution*/analysis; Environmental Monitoring; Humans; Machine Learning; Ozone*/analysis; Particulate Matter/analysis; United States

Back
to Top