Superfund Research Program
Data Management and Analysis Core
Project Leader: Aikseng Ooi
Co-Investigator: Nirav Merchant
Grant Number: P42ES004940
Funding Period: 2020-2025
Project-Specific Links
- Project Summary
Project Summary (2020-2025)
The University of Arizona Superfund Research Program (UA SRP) generates volumes and types of data that are not manageable in traditional laboratory settings. The Data Management and Analysis Core (DMAC) functions as the primary service for UA SRP in large biological, geophysical, and chemical datasets, including but not limited to RNA sequencing, chromatin immunoprecipitation sequencing, exome sequencing, metabolomics, metagenomics, microbiome amplicon sequencing, geospatial positioning, analytical chemistry, and imaging. DMAC enables investigators by performing three core functions:
- DMAC leads the housing of all data in an easy-to-access data repository system: CyVerse. Cyverse is a computational infrastructure consisting of hardware, software, and personnel that are designed to handle huge datasets and complex analyses and is maintained at the University of Arizona. DMAC utilizes a reference implementation (RI) that divides data into five levels for easy sharing, processing, and analyzing. Lowest levels (level 1) will be raw data, while higher levels (level 5) will be file formats that can be used in graphics/visualizations. DMAC supports these processes with help from on-staff statisticians and bioinformaticians who can devise analysis strategies for individual investigators. In addition to data storage, DMAC orchestrates sample management using Fulcrum software. Fulcrum allows barcoding, global positioning, and annotation of biological samples in an easy-to-use application available on both traditional workstations and mobile platforms. Fulcrum is critical for point-of-generation sample tracking due to its mobility.
- Beyond data and sample management, DMAC performs both standard and custom computational analyses of the data. This includes DMAC-led investigations into "feature signatures," which address the predictability of data across UA SRP projects. For example, can the gene expression changes associated with a particular arsenic treatment predict metagenomics changes in a similarly treated sample? In conjunction with UA SRP investigators, DMAC applies traditional algorithms or develops novel algorithms as needed to identify signatures for the different data types collected.
- The storage and analytical capabilities of DMAC are integrated into a user-friendly Web application that allows individual investigators to retrieve, manipulate, and visualize UA SRP data. The Web application is being implemented using an in-house maintained server in conjunction with the R statistical environment. DMAC is thus an integral component of the UA SRP, which uses state-of-the-art technologies to enable the discovery of novel insights into arsenic exposure and its role in health and disease.