Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Internet Explorer is no longer a supported browser.

This website may not display properly with Internet Explorer. For the best experience, please use a more recent browser such as the latest versions of Google Chrome, Microsoft Edge, and/or Mozilla Firefox. Thank you.

COVID-19 is an emerging, rapidly evolving situation.

Get the latest public health information from CDC. Get the latest research information from NIH.

Your Environment. Your Health.

Northeastern University

Superfund Research Program

Data Management and Analysis Core

Project Leader: David Kaeli
Co-Investigators: Akram N. Alshawabkeh, Jennifer Dy, Justin Manjourides, Bhramar Mukherjee (University of Michigan)
Grant Number: P42ES017198
Funding Period: 2010-2025

Learn More About the Grantee

Visit the grantee's eNewsletter page Visit the grantee's eNewsletter page Visit the grantee's Twitter page Visit the grantee's Instagram page Visit the grantee's Facebook page Visit the grantee's Video page

Project Summary (2020-2025)

The Data Management and Analysis Core (DMAC) plays a critical role in achieving the Center's objectives by serving as a central repository of Center data and providing cross-indexing and linkage of the diverse datasets produced by the environmental and biomedical projects and cores. The current PROTECT Database System holds nearly 7 million cleaned and secure data entities. The DMAC is responsible for the reliability of the data, including cleaning, replication, and backup, as well as the protection of the data, including de-identification of human subjects and secure and authenticated access. The DMAC allows data generated by the projects to be cross-indexed by all projects based on a global PROTECT Data Dictionary that includes common index fields (subject ID, GIS coordinates) to foster sharing and integration. DMAC also provides a rich set of modeling and statistical analysis toolsets and expertise to support project-level objectives. The combined collection of data and tools allows PROTECT to work seamlessly across project domains and effectively ties environmental factors to human subject outcomes.

To support Center goals and ensure its long-term impact, DMAC continues to build upon the rich infrastructure developed in the first 8 years of the Center. It continues to partner with EarthSoft, a major provider of environmental data management software, to provide enhanced database capabilities appropriate for all Center projects and cores. It continues to support cleaning, indexing, documenting, and securing of all Center-based data through a secure, online, database system, as well as provide a common suite of advanced statistical/analysis tools integrated into the backend of the system. As part of the renewal, DMAC has expanded its analytics support by adding Jennifer Dy, Justin Manjourides, and Bhramar Mukherjee, supporting machine learning and statistical analysis of mixtures that include phthalates, chlorinated volatile organic compounds (CVOCs), polycyclic aromatic hydrocarbons (PAHs), metals, and pesticides across all projects. It is expanding its use of mapping with a Geographic Information System (GIS), integrating analytics and mapping into a common framework, making its data easily understood by a wide range of communities.

To achieve Center-level aims that tie environmental factors to health-related outcomes, the DMAC continues to develop a common suite of analysis and visualization tools based on GIS, SAS, R, and Python, providing analysis tailored for each project while also leveraging state-of-the-art software and frameworks. The specific statistical tools developed for mixtures analysis use RStudio's data cleaning, visualization and archiving functions and is being disseminated through GitHub. The DMAC already has developed a suite of data mining tools that provide regression and clustering analysis in an integrated online visualization framework. Finally, it is working closely with the Community Engagement Core and the Training Core to provide education on data analysis and to support data reporting and communication of results.

Back
to Top