Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Internet Explorer is no longer a supported browser.

This website may not display properly with Internet Explorer. For the best experience, please use a more recent browser such as the latest versions of Google Chrome, Microsoft Edge, and/or Mozilla Firefox. Thank you.

Your Environment. Your Health.

REPRODUCIBILITY AND ROBUSTNESS OF DIMENSIONALITY REDUCTION

Export to Word (http://www.niehs.nih.gov//portfolio/index.cfm/portfolio/grantdetail/grant_number/R01ES027498/format/word)
Principal Investigator: Herring, Amy H
Institute Receiving Award Duke University
Location Durham, NC
Grant Number R01ES027498
Funding Organization National Institute of Environmental Health Sciences
Award Funding Period 15 Sep 2017 to 31 Jul 2022
DESCRIPTION (provided by applicant): PROJECT SUMMARY In modern biomedical studies it has become commonplace to collect high-dimensional data, and hence dimen- sionality reduction tools are of critical importance and are routinely used. Some of the most common include clustering and factor analysis. The basic tenet behind dimensionality reduction is that we can replace a high dimensional set of variables by some low-dimensional summary. This is certainly necessary to make sense of complex data and also overcome problems with high-dimensional, low sample size data. However, a critical is- sue that has not been adequately studied is reproducibility. Standard approaches for dimension reduction can be very sensitive to choice of tuning parameters and arbitrary choices (e.g., choice of kernel or distance meas- ure). This leads to a lack of robustness, with potentially very different results being produced when data are slightly perturbed. This lack of robustness tends to be compounded as the size of the data increases - both in terms of the sample size and number of variables collected. Also, a critical issue is lack of generalizability. In particular, dimensionality reduction for a particular group of individuals may fundamentally lack generalizability to other groups of individuals. This creates major problems in interpretation of results. Motivated in particular by environmental epidemiology studies collecting exposome data and by nutritional epidemiology, this project proposes to develop fundamentally new methods for improving robustness and reproducibility of di- mensionality reduction through the following specific aims. (1) Develop robust methods of factor analysis designed to limit sensitivity to arbitrary assumptions and size of the data. (2) Develop robust methods of model-based clustering designed to limit sensitivity to arbitrary assump- tions and size of the data. (3) Develop novel methods for robust clustering from multivariate and grouped data designed to avoid typical pitfalls of mixture models with increasing p. (4) Develop robust consensus methods that estimate low dimensional summaries that best reflect struc- ture across subpopulations. (5) Apply the proposed methods to data from key epidemiologic cohorts that have measured a wide va- riety of environmental, behavioral, and biological exposures and provide a general use software package for implementation. This package is designed to be easily used and accommodate a broad variety of data types, further aiding reproducibility and transparency.
Science Code(s)/Area of Science(s) Primary: 81 - Statistics/Statistical Methods/Development
Publications See publications associated with this Grant.
Program Officer Bonnie Joubert
Back
to Top