Superfund Research Program
Project Summary (2006-2011)
The support provided under this Core reflects a growing trend in studies of environmental exposure from more traditional epidemiological studies and simple experimental designs to high-dimensional biology, with its emphasis on 'omic' technologies and complicated questions addressing the possible interaction of environmental exposures and high-dimensional measures of the genome, proteome, etc. These high-dimensional data sets are characterized by many (thousands) measurements made on only a few independent units (e.g., people). Thus, the Core reflects a parallel evolution in the field of biostatistics towards developing methodologies that can both find patterns in high dimensional data sets as well as providing proper statistical inference for these patterns. Besides offering consulting on traditional epidemiological experimental design and analysis questions, the Core focuses its efforts on providing the most relevant and rigorous statistical techniques to the Program. With new 'omic' technologies, biology has entered a new more empirical phase where the goals of the research are ambitious (e.g., discovery of regulatory gene networks affected by particular environmental toxicants), but the sample sizes relatively small (biological replicates numbering in the tens). With these technologies, have come also a proliferation of proposed methods to find biologically meaningful patterns and typically little theory is provided to guide their relative worth. The goal of this Core is to provide the project researchers with the best techniques available, software to help implement them, a computational environment that can handle computer-intensive methods on large data sets and, most importantly, rigorous statistical inference for the parameters estimated by these procedures. A subset of the developments related to the proliferation of high-dimensional biological/epidemiological data particularly relevant to this core are 1) multiple testing, 2) machine-learning and loss-based estimation, 3) grouping algorithms methods, 4) causal inference and 5) biological metadata and systems biology. In addition, the Core provides access to a computational environment that lends itself to the computationally intensive methods developed for data mining and re-sampling based inference.