Superfund Research Program
Data Management and Analysis Core (DMAC)
- Project Summary
Project Summary (2022-2027)
The center investigators seek to understand and remediate potential health risks posed by complex exposure scenarios present at hazardous waste sites using a systems approach. Several of the projects use cutting-edge analytical chemistry, sequencing, and other approaches to produce high-dimensional “omic” data with thousands of parallel measurements on a specific endpoint. These data are analyzed to identify biological processes that are perturbed in complex environmental exposure scenarios. The Data Management and Analysis Core (DMAC) supports all aspects of the scientific data process: from statistical design, data management, and QA/QC to providing high performance computing platforms, with secure access to center data, consulting on data science, biostatistical analysis, to development of new methodology and its dissemination for project goals. The DMAC thus supports the acquisition, storage, analysis, and sharing of large, complex datasets through the development of tools, infrastructure, and expertise. It develops data-driven, machine-learning methods to find patterns in high-dimensional data sets to understand biological perturbations and potential health risks associated with exposures. These efforts include proposals for new statistical algorithms (and resulting software) on discovering which patterns of chemical mixtures have greatest potential human health impacts. As these methods require a lot of computing power, core leaders work with the Berkeley Research Computing group to provide a platform for computation that provides for fast, scalable solutions, as the resulting system will have over 100 CPUs. This platform also has direct access through the integration with Box file management system. Finally, the DMAC manages access to center data, metadata, analysis plans and other supporting material releases using the Open Scientific Framework (osf). In conclusion, this core is an integral and critical component of the overall program that supports the whole life history of center data.