Final Progress Reports: University of California-Berkeley: Statistics and Computing Core

Superfund Research Program

Statistics and Computing Core

Project Leader: Robert C. Spear
Grant Number: P42ES004705
Funding Period: 1995 - 2006

Project-Specific Links

Final Progress Reports

Publications

Project Summary
- 2000-2006
- 1995-2000

Final Progress Reports

Year: 2005

The purpose of the Core is to provide project investigators with consultative support in biostatistics and to support computer-based communication and database needs. Justin Girard maintains the EHS division server, which acts as a file, print and backup server for data operations relating to Superfund research as well as Administrative Core activities. Justin also manages the local Superfund web-server. His duties include maintaining the hardware and software for the servers, supervising the upgrade of relevant software and ensuring worldwide access to the html content. Justin is also in charge of maintaining security on these servers as well as providing a failsafe backup for the data. Dr. Hubbard continues to provide statistical guidance in both the design of experiments and the analyses of data.

The statistical support has included collaboration in projects using recently developed data-mining and multiple-testing procedures, which are well-suited to the analysis of large, complex data sets (e.g., genomic and proteomic data). Specifically, the core has continued their study of the effects of benzene exposure on peripheral blood leukocyte gene expression in a population of shoe-factory workers with well-characterized low-level occupational exposures to benzene. Lymphocyte RNA was analyzed using the standardized human arrays: the U133A/B Affymetrix GeneChip. Core researchers have developed of method of combining new empirical Bayesian technique that provides more power to detect genes that might be affected by benzene exposure (van der Laan, Birkner and Hubbard, 2005). This technique is particularly well-suited in targeting those genes, among thousands, with the most evidence of an association with benzene exposure.

Recently core researchers have begun analyzing mass spectrometry proteomic data comparing subjects with ALL and AML forms of leukemia, looking for a signature for the disease. The first challenge is to process the signal versus mass to remove noise and drift. In this case, the researchers utilize replicates made on each subject to optimize the processing. The data has certain characteristics that make replicate samples particularly helpful: 1) error in the estimate of the mass, 2) variability in the corresponding intensity measurements and 3) drift in the background intensity. They have developed new techniques that take advantage of their replicates to target proteins with the most consistent differences between the two types of leukemia. The results are several proteins, of which they have reasonable confidence that there are systematic differences in abundance between the two forms of leukemia. These serve as just two of many examples of how Core C has helped to developed new computational methodology (as well as monitoring methodologies being derived elsewhere) that are well-suited to analyzing the high-dimensional (`omic) biological data being generated by the researchers with whom they collaborate.

National Institute of Environmental Health Sciences

Webcasts

Your Environment. Your Health.

Final Progress Reports: University of California-Berkeley: Statistics and Computing Core

Superfund Research Program

Statistics and Computing Core

Project-Specific Links

Final Progress Reports

Final Progress Reports: University of California-Berkeley: Statistics and Computing Core

Superfund Research Program

Statistics and Computing Core

Project-Specific Links

Connect with the Grant Recipients

Final Progress Reports