Title: Sample size requirements to detect gene-environment interactions in genome-wide association studies.
Authors: Murcray, Cassandra E; Lewinger, Juan Pablo; Conti, David V; Thomas, Duncan C; Gauderman, W James
Published In Genet Epidemiol, (2011 Apr)
Abstract: Many complex diseases are likely to be a result of the interplay of genes and environmental exposures. The standard analysis in a genome-wide association study (GWAS) scans for main effects and ignores the potentially useful information in the available exposure data. Two recently proposed methods that exploit environmental exposure information involve a two-step analysis aimed at prioritizing the large number of SNPs tested to highlight those most likely to be involved in a GE interaction. For example, Murcray et al. ( Am J Epidemiol 169:219–226) proposed screening on a test that models the G-E association induced by an interaction in the combined case-control sample. Alternatively, Kooperberg and LeBlanc ( Genet Epidemiol 32:255–263) suggested screening on genetic marginal effects. In both methods, SNPs that pass the respective screening step at a pre-specified significance threshold are followed up with a formal test of interaction in the second step. We propose a hybrid method that combines these two screening approaches by allocating a proportion of the overall genomewide significance level to each test. We show that the Murcray et al. approach is often the most efficient method, but that the hybrid approach is a powerful and robust method for nearly any underlying model. As an example, for a GWAS of 1 million markers including a single true disease SNP with minor allele frequency of 0.15, and a binary exposure with prevalence 0.3, the Murcray, Kooperberg and hybrid methods are 1.90, 1.27, and 1.87 times as efficient, respectively, as the traditional case-control analysis to detect an interaction effect size of 2.0.
PubMed ID: 21308767
MeSH Terms: Case-Control Studies; Disease/genetics; Environment; Genome-Wide Association Study/statistics & numerical data*; Humans; Logistic Models; Models, Genetic; Molecular Epidemiology/statistics & numerical data; Polymorphism, Single Nucleotide; Sample Size; Software