Superfund Research Program
Data Management and Analysis Core
Project Leader: Arthur Goldsmith (Columbia University Mailman School of Public Health)
Co-Investigator: Brian J. Mailloux (Barnard College)
Grant Number: P42ES033719
Funding Period: 2022-2027
- Project Summary
Project Summary (2022-2027)
The Data Management and Analysis Core (DMAC) ensures wide accessibility of the complex and integrated health and earth science data generated within the Columbia University Northern Plains Superfund Research Program (CUNP-SRP). These efforts are guided by an overarching mission to treat and share data according to the principles of tribal data sovereignty and the research code set in place by partnering communities in the Northern Plains. The DMAC dedicates significant resources to supporting application of existing analysis methods, developing innovative analysis techniques, and ensuring long-term reproducibility of results by leveraging statistical and data science expertise. The DMAC is centrally positioned in the CUNP-SRP and serves all projects and cores, including the Community Engagement Core (CEC) and Research Experience and Training Coordination Core (RETCC), through three aims.
Aim 1 integrates and enhances data management, sharing, and interoperability. The team uses established capabilities of the Data Management Unit at Columbia University to develop customized data management and quality assurance plans for each Project/Core, manage data collection and databases, coordinate and harmonize datasets, and provide for their efficient querying. The DMAC creates streamlined data communication across Projects and with external partners and data requestors, following appropriate procedures approved by the partnering tribal communities, by creating an integrated webpage that provides central access to the databases and offers advanced search capabilities. The webpage acts as a platform to locate, access, and mine data while meeting the data sharing requirements of each study. The team also shares data via this Database Directory and work with investigators, data owners, and governmental or policymaking agencies to locate additional available online data resources.
Aim 2 expands statistical resources, data analysis capability, and reproducibility tools. DMAC provides expertise in established methods for data analysis including statistical and physical modeling. It also supports the development of innovative methods, particularly in complex and high-dimensional data inherent to omics research and to complex environmental and geospatial research. Additionally, DMAC develops, tests, and applies robust implementations of new methods for complex data and ensure long-term reproducibility of findings through containerized analysis pipelines.
Aim 3 educates investigators, trainees, and citizen scientists in data sovereignty, sharing, management, and analysis. DMAC will collaborate with the RETCC to organize workshops, seminars, and other educational opportunities. Methods, results, and educational resources are shared with all stakeholders via CUNP-SRP outreach through the CEC, Administrative Core, and the DMAC webpage. Procedures established by DMAC strive to meet the needs of all investigators and partnering communities, adding substantial value to collaborations within the CUNP-SRP, across other SRP centers, and to the wider community.