: New environmental exposure technologies for chemical biomonitoring and person-specific sampling, such as in homes or breathing space, are crucial advances for finding environmental causes of disease and guiding prevention. These methods yield valuable exposure data for major NIH cohort studies and for surveillance, notably NHANES biomonitoring for more than 200 chemicals in the US population and the new EPA ExpoCast. Personal exposure measurements are expensive and sometimes non-repeatable, motivating open data- sharing, including in online repositories. At the same time, personal measurements raise new ethical concerns about the possibility that the identity of study participants might be revealed even in data considered anonymized, a process called re-identification. Proliferating public data and computing power will continue to increase privacy risks. Prompted by visible instances of re-identification, healthcare and genetics researchers, among others, are debating and investigating new practices for redacting shared data, warning participants of privacy risks, and sometimes requesting "open consent" to share data without protecting privacy. However, computational privacy risks have not yet been investigated for environmental chemical exposure data. These data may pose risks through novel linkage strategies using data such as on real estate, environmental compliance, permits, weather, and consumer purchases. Re-identification could result in stigma for "contaminated" individuals or communities, reveal behavior a person considers private (e.g., smoking or use of overseas products banned in the US), trigger legal obligations if a regulated chemical is measured, or affect property values, insurance, or employability. This project will empirically evaluate privacy
risks and develop solutions for environmental health studies. It will engage an Advisory Council of environmental health scientists, computer scientists, policymakers, community leaders, and bioethicists to provide input and seek consensus on complex ethical and values-based considerations. Building on the investigators' established computational model for health data, this study will develop a model for predicting re-identification risk in environmental health studies. By applying the model to 10 important environmental studies, the project will quantify privacy risks in this field and identify specific data fields that contribute to risk. The model wil be validated by testing the actual number of re-identifications in a household exposure study. Based on results indicating risky data fields, the study will test and seek to optimize procedures to redact or mask data to improve privacy while retaining scientific utility for data-sharing. Because data-sharing decisions ultimately rest on participants' informed consent, the project complements computational analyses by asking participants in two large, innovative online studies about their understandings and values related to privacy and data-sharing. Results from this project will provide researchers with ethically and technically sound methods for sharing environmental data, contributing to more-rapid discovery of preventable causes of disease.