Skip Navigation

Final Progress Reports: University of California-Davis: Statistical Analysis of Toxics Measurement Data

Superfund Research Program

Statistical Analysis of Toxics Measurement Data

Project Leader: David M. Rocke
Grant Number: P42ES004699
Funding Period: 1995-2010

Project-Specific Links

Connect with the Grant Recipients

Visit the grantee's eNewsletter page Visit the grantee's Facebook page

Final Progress Reports

Year:   2004 

During the last year, Rocke’s team has made progress on several fronts.  Much of their current effort is in the application of statistical methods, including classification and clustering methods, to high-dimensional biological data such as those from DNA arrays, proteomics, and metabolomics.  First, the team has extended its previous work on the use of the two-component model to DNA microarrays.  This gives the best, most precise methods to date for determining whether a gene is differentially expressed in an exposed vs. control population.  The researchers have developed a specialized data transformation for gene expression data (the generalized logarithm or glog) that makes it more compatible with standard statistical methodology such as regression and the analysis of variance. They have shown that this transformation is applicable to cDNA arrays and to oligo arrays including those from Affymetrix. The investigators have extended these techniques to proteomics by LC/MS and metabolomics by NMR spectroscopy and have developed new methods for classification with high-throughput assay data, and for clustering biological samples based on these many measurements.  These statistical methods will have considerable application in analysis of biomarkers of exposure to toxic substances, since it will now be easier to find complex patterns of exposure that serve as biomarkers.

Innovative Spin-offs

Work this team did originally under the SBRP in the early 1990's on the statistical properties of analytical instrumental analysis has led to advances in identifying biomarkers from new methodologies including gene expression microarrays and various methods of proteomics and metabolomics.

Notable Advances

One of the great difficulties in interpretation of gene expression array, proteomics, and metabolomics data is that data from genes with low expression levels appeared to be poorly behaved compared to data from genes with high expression levels. Yet genes, proteins, or metabolites with low concentration in absolute terms may be very important in metabolism. The research team’s newly developed glog data transformation places all genes/proteins/metabolites on the same scale so that low concentration biologically active compounds can be analyzed together with high concentration ones. This has allowed much more sensitive and accurate discovery of biomarkers of exposure to toxic substances.

to Top