Superfund Research Program
Data Management and Analysis Core
The Data Management and Analysis Core (DMAC) plays a critical role in the efficient and secure transmission, storage, cleaning, harmonization, management, sharing, analysis, and dissemination of biomedical and environmental data collected and analyzed across the PROTECT center. The DMAC provides software-engineered user-friendly analytic tools and automated pipelines to address the needs of the projects and other cores in PROTECT and enables cross-project collaboration through data integration and harmonization, and effectively accommodates the growing volume and velocity of data collection. Major activities over the past year included missing data handling and development of new harmonization tools, as well as continuing the DMAC’s data collection/cleaning campaign.
Over the past year, the research team’s progress includes continued exports from the Human Subjects and Sampling Core (having collected data for 2,117 participants presently in the database, with over 10.6 million data points). Some of the major research activities of the DMAC this year have been on improving their methods to handle missing data during analysis (Dong 2020), analysis of publicly available water quality datasets (Purandare 2020), as well as developing harmonization toolsets for coalescing diverse data types. The researchers are focused now on developing new analytical methods to address mixtures, enhancing environmental data collection tracking and monitoring, and developing an automated export protocol for incorporating toxicology data from the Toxicant-Stimulated Disruption of Gestational Tissues with Implications for Adverse Pregnancy Outcomes Project to the PROTECT database system.