Superfund Research Program
Quantitative Biology: Biostatistics, Bioinformatics, and Computation
Project Summary (2011-2017)
The purpose of the Quantitative Biology Core is to provide investigators with consultative support in biostatistics/computational biology and bioinformatics, and to support web-based dissemination of bioinformatic solutions and database access. Most specific aims with the projects produce high-dimensional biological and exposure data, and often involve complicated questions addressing the possible interaction of environmental exposures and high-dimensional measures of the genome, proteome, and other high throughput technologies. These high dimensional data sets are characterized by many thousands of measurements made on each unit (e.g. person, yeast culture, soil community). The Quantitative Biology Core reflects an evolution in the field of biostatistics and bioinformatics towards developing methodologies that can both find patterns in high dimensional data sets as well as providing proper statistical inference for these patterns. A consensus among the project researches and the methodological experts has formed around a set of core principles regarding optimal estimation and inference in the context of complicated questions and high-dimensional data. Specifically, the consensus favors using (when possible): semi-parametric locally efficient estimation with robust inference and the development of optimal methods used to integrate the statistical results into existing metadata to suggest relevant biological pathways and networks. Applying this approach enables analyses to incorporate diverse data to query similar patterns/pathways in both related toxins and possible related diseases thus substantially leveraging data generated by the Program. To implement this methodology, the Quantitative Biology Core provides access to a computational environment that lends itself to the computationally intensive methods developed for data mining and re-sampling based inference. Because of the scale of the data collection as well as the desirability of converging to a general methodology, the Program requires a more centralized system that can archive data for, provide sharing to this Core, and provide guidance on the access of metadata/annotation and routines for leveraging such data to find overprinting of our results on existing hypothesized regulatory networks. The Core is also developing tools to find and compares pathway, and create and maintain a web-based system that will allow for both efficient sharing of their methodological expertise with the project researchers and ultimately serve as a tool for outreach among the general scientific community.