Skip Navigation

Final Progress Reports: Harvard School of Public Health: Environmental Statistics Core

Superfund Research Program

Environmental Statistics Core

Project Leader: Xihong Lin
Co-Investigator: Brent A. Coull
Grant Number: P42ES016454
Funding Period: 2010-2015
View this project in the NIH Research Portfolio Online Reporting Tools (RePORT)

Project-Specific Links

Connect with the Grant Recipients

Visit the grantee's eNewsletter page Visit the grantee's Twitter page

Final Progress Reports

Year:   2013 

Studies and Results

In support of the Epidemiology of Developmental Windows, Metal Mixtures and Neurodevelopment project, Dr. Brent Coull, along with two Biostatistics postdoctoral research fellows, continue to work closely with Drs. Birgit Claus Henn and Maitreyi Mazumdar on preliminary data analyses of the Tar Creek and Bangladesh neurodevelopment data. These analyses have focused on cohort-specific analyses of the associations between neurocognitive test phenotypes and exposure to manganese, lead, and arsenic, both individually and jointly as part of the metal mixture. Single pollutant analyses were based on generalized additive models that can flexibly estimate nonlinear forms for the exposure-response relationship, whereas the mixtures analyses were based on a new method, Bayesian Gaussian process models, developed by this research group. This approach allows for nonlinear and nonadditive effects of the individual metal exposures, while estimating the importance of each metal exposure within the overall mixture. A manuscript describing results of applying this approach to data from the Bangladesh cohort (Bobb et al. 2013) has been submitted for publication.

Drs. Brent Coull and Joel Schwartz have continued their collaboration with investigators in the Epidemiology of Developmental Windows, Metal Mixtures and Neurodevelopment project to develop spatio-temporal prediction models for residence-level air pollution exposures in Mexico City, to be used as co-exposures with the metal exposures currently measured as part of the project. Preliminary results from models for spatio-temporal variation in PM2.5 levels based on remote-sensing satellite aerosol optical depth (AOD) data suggest there is sufficient variation in ambient levels for use in health effects models from the Mexico cohort. Drs. Brent Coull and Robert Wright have also developed interaction distributed lag models to identify windows of susceptibility to time-varying metal exposures, and time-varying interactions between multiple metals, and have applied these models to data from Mexico. Finally, Dr. Brent Coull co-authored a review article on chemical mixtures and children's health with the Epidemiology of Developmental Windows, Metal Mixtures and Neurodevelopment project investigators Drs. Robert Wright and Birgit Claus Henn.

In support of the Genetic Epidemiology of Neurodevelopmental Metal Toxicity project, Dr. Xihong Lin together with the postdoctoral fellows have been working with Drs. David Christiani and Robert Wright on analyzing data for the GWAS using the Mexico cohort for the discovery phase and the Bangladesh cohort for the validation phase. Core researchers have been conducting several analyses, including GWAS analysis of studying the main SNP effects on several phenotypes, including birth weight, head circumference, MPI and PDI at six months and twelve months. The researchers have also performed gene-environment GWAS analysis to study the interaction between SNPs and lead and mercury exposures at the second and third trimesters and at birth on birthweight, head circumference, MPI and PDI at six months and twelve months. They have identified several interesting top SNPs located in the genes that have good biological interpretation. A manuscript to report the findings is nearly finished.

Bioinformatics Activities:

The HSPH Bioinformatics Core has focused on providing support for data management activities and to prepare for the arrival of data both from the GWAS and the in vitro cellular screens. In particular, core staff have generated a workflow for using expression information (or generic gene lists) to re prioritize variants identified in GWAS studies and generated various mappings of common SNPs to biological pathways and networks from standard resources. The Core has also helped coordinate data cleanup, answer transfer questions, and set up a new research computing environment to ensure genotype data can be stored securely and analyzed quickly as it arrives. This will provide Superfund researchers with an inexpensive storage solution closely linked to one of the largest computational resources in Massachusetts.

To support data analysis and integration, parameter settings for exploratory GWA algorithms (GRAIL, DAPPLE and Magenta) were optimized and core staff deployed resources allowing for the combination of rankings from all three approaches into a unified network representation using public protein protein binding and gene co expression information.

In collaboration with the group of Dr. Quan Lu, Core staff have set up workflows to automate the analysis of high throughput knockdown experiments (shRNA and mRNA sequencing data). HBC has provided support in creating RNA-seq pipelines, in particular adding support for paired-end read analysis and expanding from a standard analysis of differential expression counts (DESeq) to utilize different modules (DSS, EdgeR) that are more suitable for low-count read data. Additional support was provided in running test samples through the pipeline, quality control and filtering of actual experimental data as well as training and education in using the pipeline directly and subsequent downstream analysis of results in Galaxy.

Methodological Research:

Motivated by the Genetic Epidemiology of Neurodevelopmental Metal Toxicity project, Core researchers published a paper on testing for interaction between a SNP set and environment interaction in Biostatistics (Lin, et al, 2013). Core researchers have been working on extending this work to test for interaction between a set of rare variants and environment interaction. This paper will be submitted soon.

In conjunction with the Optimizing Sampling and Statistical Analysis for Hazardous Waste Site Assessment project focusing on spatio-temporal methods for site-assessment, remediation, and risk assessment, Dr. Brent Coull, postdoctoral fellow Dr. Jennifer Bobb, and colleagues have submitted the paper on Bayesian Gaussian process models with variable selection to assess the health effects of metal mixtures. This class of models, closely related to kernel machine regression methods pioneered by Dr. Xihong Lin for genomic analyses, allows for nonlinear and nonadditive effects of mixture components on a given outcome, and allows one to estimate the probability that a given component plays an active role in the overall mixture in its effect on health. Investigators are investigating the methods as a supervised (by the outcome) data-dimension reduction technique that yields an overall metal-mixture risk score that can then be used in spatial design strategies for univariate exposures. Also in conjunction with the Optimizing Sampling and Statistical Analysis for Hazardous Waste Site Assessment project, Drs. Brent Coull, Nikolay Bliznyuk, and Chris Paciorek completed a second revision of a paper on computationally efficient algorithms for integrating spatio-temporal data collected at different temporal resolutions, such as daily and weekly, for publication in the Annals of Applied Statistics. Dr. Brent Coull and trainee, Dr. Stacey Ackerman-Alexeeff, submitted two other papers focusing on methods for correcting for spatially correlated measurement error in spatial epidemiology for publication.


It is important that the studies of the Superfund projects can be conducted under careful and rigorous statistical and bioinformatics consideration and analyses of the Superfund data use modern statistical and computational methodology. In addition, a number of other environmental epidemiological and genomic studies are in progress at the Harvard School of Public Health, and the presence of a coherent core of statisticians and bioinformaticians, who are involved in these projects, assures optimum communication among the projects.

to Top