Title: Modeling exposures for DNA methylation profiles.
Authors: Siegmund, Kimberly D; Levine, A Joan; Chang, Jing; Laird, Peter W
Published In Cancer Epidemiol Biomarkers Prev, (2006 Mar)
Abstract: We extend the finite mixture model to estimate the association between exposure and latent disease subtype measured by DNA methylation profiles. Estimates from this model are compared with those obtained from the simpler two-phase approach of first clustering the DNA methylation data followed by associating exposure with disease subtype using logistic regression. The two models are fit to data from a study of colorectal adenomas and are compared in a simulation study. Depending on the analytic approach, we obtain different estimates of the odds ratio (OR) and its 95% confidence interval (95% CI) for the association of RBC folate and DNA methylation subtype in colorectal adenomas (OR, 0.31; 95% CI, 0.08-1.26 from the extended finite mixture model; OR, 0.44; 95% CI, 0.15-1.28 from the two-phase approach; n = 58 case subjects). Although our results could be a chance occurrence due to fluctuations from small sample size, we did a simulation study using larger samples and found that differences between the two approaches emerge when there is noise in the cluster analysis. In the naive two-phase approach, the estimate of the OR is biased towards the null, and its SE is underestimated when there is error in the cluster assignment. Estimates from the extended mixture model are unbiased and have the correct SE estimate but may require larger sample sizes for convergence. Thus, when the clusters are not identified with certainty, the extended mixture model is preferred for valid estimation of the OR and CI.
PubMed ID: 16537717
MeSH Terms: Adenoma/epidemiology; Adenoma/genetics*; Adenoma/pathology; Biomarkers, Tumor/analysis*; Cluster Analysis; Colorectal Neoplasms/epidemiology; Colorectal Neoplasms/genetics*; Colorectal Neoplasms/pathology; DNA Methylation*; DNA, Neoplasm/analysis; DNA, Neoplasm/genetics; Female; Humans; Male; Models, Theoretical; Reproducibility of Results; Sensitivity and Specificity