Superfund Research Program
Data Management and Modeling Core
Project Leader: David Kaeli
Grant Number: P42ES017198
Funding Period: 2010-2025
Project Summary (2014-2020)
The Data Management and Modeling Core plays a critical role in achieving the Center objectives by serving as a central repository of Center data and provides for cross-indexing of the diverse data sets produced by the environmental and biomedical projects and cores in the Center. An additional role of this Core is to ensure the reliability of the data, including cleaning, replication and backup, as well as the protection of the data, including de-identification of human subjects, and secure and authenticated access. This Core allows data generated by the projects to be cross-indexed by all projects based on a global PROTECT Data Dictionary that includes common index fields (subject ID, GIS coordinates) to foster sharing and integration. This Core also provides a rich set of modeling and statistical analysis toolsets to support Project-level objectives. The combined collection of data and tools allows PROTECT to work seamlessly across project domains and effectively ties environmental factors to human subject outcomes. Finally, the Data Management and Modeling Core is leveraging state-of-the-art in search-based technologies that have been developed at Google to support search and Data Mining.
To support Center goals and ensure its long-term impact, David Kaeli, Ph.D., and his research team continue to build upon the rich infrastructure developed in the first three years of this Center. They have partnered with EarthSoft, a major developer of environmental data management software, to provide enhanced database capabilities appropriate for all Center projects. The researchers continue to support cleaning, indexing, documenting, and security of all Center-based data through a secure, online, database system. To allow data to be analyzed by each project, the researchers provide advanced statistical/analysis tools integrated into the backend of the database system. Specifically, they work with the Molecular Epidemiology Study of Phthalate Exposure and Preterm Birth in Puerto Rico project to incorporate appropriate biostatistics tools, with the Toxicant Activation of Pathways of Preterm Birth in Gestational Tissue project to integrate toxicological data, with the Discovery of Xenobiotics Associated with Preterm Birth project to assist in nontargeted detection, with the Dynamic Transport and Exposure Pathways of Contaminants in Karst Groundwater Systems project to utilize appropriate environmental assessment tools and Geographic Information System (GIS), and with the Remediation of Contaminated Groundwater by Solar-Powered Electrolysis project to integrate and model remediation field data. To allow the data in the center to be easily understood by a range of communities, Core researchers are integrating mapping and data visualization capabilities and enabling effective dissemination of data to a broad audience.
To achieve the Center-level objectives that tie environmental factors to health-related outcomes, this Core provides Data Mining, modeling and analysis for all projects by leveraging state-of-the-art computing capabilities. Core researchers provide a suite of Data Mining tools that go well beyond standard statistical techniques. Finally, the researchers work closely with the Community Engagement Core, Training and Research Translation Cores to generate mapping and statistical information required by the partner Cores to foster awareness, research translation, education, and high quality publications.