Skip Navigation

DATA SCIENCE TOOLS TO IDENTIFY ROBUST EXPOSURE-PHENOTYPE ASSOCIATIONS FOR PRECISION MEDICINE

Export to Word (http://www.niehs.nih.gov//portfolio/index.cfm?do=portfolio.grantdetail&&grant_number=R01ES032470&format=word)
Principal Investigator: Patel, Chirag J.
Institute Receiving Award Harvard Medical School
Location Boston, MA
Grant Number R01ES032470
Funding Organization National Institute of Environmental Health Sciences
Award Funding Period 10 Sep 2021 to 30 Jun 2026
DESCRIPTION (provided by applicant): Project Summary/Abstract Phenotypic variability across demographically diverse populations are driven by environmental factors. The overall goal of this proposal is to deploy data science approaches to drive discovery of associations between exposures (E) and phenotypes (P) in demographically diverse populations. We lack data science methods to associate, replicate, and prioritize exposure variables of the exposome (E) in phenotypes (P) and disease incidence (D), required for the delivery of precision medicine. Observational studies are fraught with 4 unsolved data science challenges. First, E-based studies are: (1) limited to associating a few hypothesized exposure- phenotype pairs (E-P) at a time, leading to a fragmented literature of environmental associations. Machine learning (ML) approaches for feature selection and prediction hold promise, however, (2) most extant E-based cohorts contain missing data, challenging the use of ML to detect complex E-P associations, Third, (3) biases, such as confounding and study design influence associations and hinder translation. Fourth, (4) there are few well-powered data resources that systematically document longitudinal E-P and E-D associations across massive precision medicine. It is a challenge to systematically associate a number of exposures in multiple phenotypes and replicate these associations across cohorts. (Aim 1). The “vibration of effects”, or the degree to which associations change as a function of study design (e.g., analytic method, sample size) and model choice is a hidden bias in observational studies (Aim 2). Third, an outstanding question is the degree to which environmental differences lead to health disparities. To address these challenges and gaps, we propose to Aim 1: develop and test machine learning methods to associate multiple environmental exposure indicators with multiple phenotypes: EP-WAS. We hypothesize that exposures will explain a significant amount of variation in phenotype in populations and will deposit all data and models in a novel EP-WAS Catalog. Aim 2: Quantitate how study design influences associations between exposure biomarkers and phenotype. We will scale up, extend, and test a method called “vibration of effects” (VoE) to measure how study criteria influences the stability of associations (how reproducible associations are as a function of analytic choice). Aim 3. Leverage EP-WAS and VoE to disentangle biological, demographic, and environmental influences of phenotypic disparities in hypercholesterolemia. We will deploy EP-WAS and VoE packaged libraries in the largest cohort study to partition phenotypic variation across demographic groups in factors for hypercholesterolemia. We will equip the biomedical community with data science approaches for robust data-driven discovery and interpretation of exposure-phenotype factors in observational datasets, required for the identification of environmental health disparities. For the first time, investigators will ascertain the collective role of the environment in heart disease at scale just in time for the All of Us program.
Science Code(s)/Area of Science(s) Primary: 75 - Computational Biology/Computational Methods for Exposure Assessment
Secondary: 03 - Carcinogenesis/Cell Transformation
Publications See publications associated with this Grant.
Program Officer Christopher Duncan
Back
to Top