Skip Navigation

STATISTICAL METHODS FOR AIR-POLLUTION STUDIES USING LOW-COST MONITORS

Export to Word (http://www.niehs.nih.gov//portfolio/index.cfm?do=portfolio.grantdetail&&grant_number=R01ES033739&format=word)
Principal Investigator: Datta, Abhirup
Institute Receiving Award Johns Hopkins University
Location Baltimore, MD
Grant Number R01ES033739
Funding Organization National Institute of Environmental Health Sciences
Award Funding Period 10 Feb 2022 to 30 Nov 2026
DESCRIPTION (provided by applicant): Project summary/abstract Air pollution research is increasingly adopting emergent cost-effective technologies to measure pollutant levels at spatial and temporal scales finer than that delivered by the geographically sparse network of regulatory monitors. Low-cost air-pollution monitors, while promising, introduce a series of data features like need for field co-location and calibration to eliminate noise, spatio-temporally correlated massive datasets, and repeated mea- sures on exposures. Current statistical methodology for more traditional air-pollution data collection schemes are not optimized to properly exploit the noisy, high-throughput, and spatio-temporally dependent low-cost data. This proposal pursues multi-faceted statistical methods development motivated by the unique features of the low-cost monitoring data to improve the rigor and widen the breadth of scientific findings based on such data. Our first innovation is a spatial-filtering method for calibration of the noisy low-cost data. Regression calibra- tion of low-cost networks using field co-location with regulatory monitors leads to underestimation of air-pollution peaks – a critical flaw from a health perspective. The current practice also fails to exploit the spatial correlation among exposure levels in the network. Our proposed filtering approach mitigates both issues and will be used to produce network-wide calibrated and smooth high resolution spatio-temporal maps of pollutants. Our next set of innovations concern proper utilization of the high-throughput data from low-cost networks. The large low-cost datasets have increased uptake of data-intensive machine-learning (ML) methods like ran- dom forests (RF) for exposure prediction modeling. However, exposure data are spatio-temporally correlated and RF encounters numerous issues for dependent data leading to loss of accuracy. We proposed RF-GLS, a novel extension of RF that explicitly accounts for spatio-temporal correlation to improve predictions. We will develop extensions of RF-GLS for use in the spatial-filtering, for predicting categorical exposure data (like Air Quality Index category), and for estimating exposure effects after accounting for confounders. We will use RF-GLS for predicting personal exposures using the low-cost ambient and wearable network data in Baltimore. We recognize that the rich repeated measures data on exposures from low-cost monitors can be directly used in association studies between health and air-pollution without any ad-hoc and lossy data reduction like using the mean exposure. We propose a scalar-on-distribution-analysis (SoDA) that uses the entire sample of exposures as a distribution-valued covariate in association studies. SoDA is tailored to repeated measures covariates and will be more efficient than the general-purpose SoFR (scalar-on-function-regression). SoDA will be used to directly assess which aspects of an individual's exposure distribution correlate most with their health, which in turn can help re-evaluate and update current air quality standards. The statistical methods proposed here will be applied to analyze low-cost ambient and personal exposure networks in Baltimore. We will also implement the proposed methods in publicly-available user-friendly software.
Science Code(s)/Area of Science(s) Primary: 15 - Exposure Assessment/Exposome
Secondary: 03 - Carcinogenesis/Cell Transformation
Publications See publications associated with this Grant.
Program Officer Yuxia Cui
Back
to Top