Your Environment. Your Health.

Data Sharing Resources

Superfund Research Program

This page provides information and resources on data sharing, data repositories, citing data, data integration, and data science training.

Data Sharing

  • FAIR Data Principles
    The FAIR Data Principles are a set of guiding principles published by FORCE11 that provide a framework for sharing data in a way that maximizes use and reuse by making data findable, accessible, interoperable, and reusable. Wilkinson et al., 2016 introduces the FAIR principles and rationale.
  • FAIR Self-Assessment Tool
    Created by the Australian Research Data Commons, this online tool allows one to assess the "FAIRness" of a dataset and provides suggestions for improvement and links to more information.
  • FAIRShake
    A software toolkit developed for the NIH Data Commons Pilot Phase Consortium (DCPPC) to enable the assessment of compliance of biomedical digital research objects with the FAIR guiding principles. FAIRShake was developed to facilitate the establishment of community-driven FAIR metrics and rubrics paired with manual and automated FAIR assessments.
  • FAIRsharing provides a curated, informative, and educational resource on data and metadata standards and data policies, including those linked to specific funders and journals. Use it to identify and cite the standards, databases, or repositories that exist for your data and discipline.
  • How To Make Your Data FAIR
    Research guide produced by OpenAIRE, a European organization that seeks to shift scholarly communications toward openness and transparency. The guide includes a checklist to determine the "FAIRness" of one’s data.
  • National Library of Medicine Strategic Plan (2017 – 2027)
    The NLM strategic plan sets a course for data-driven discovery and health.
  • NIH Strategic Plan for Data Science
    NIH released its first Strategic Plan for Data Science in June 2018. It provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem.
  • SPARC Data Sharing Requirements by Federal Agency
    SPARC is a global coalition committed to making "open" the default status for research and education. Its website contains a resource for tracking, comparing, and understanding both current and future U.S. federal funder requirements for sharing research data.

Relevant Publications

Data Repositories

  • Data Observation Network for Earth (DataONE)
    DataONE is a community-driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data. DataONE provides data usage and citation metrics for datasets.
  • DataCite
    Search the DataCite registry for datasets, software, images, and other research material.
  • DataMed
    DataMed is a prototype biomedical data search engine being developed for the NIH BD2K Data Discovery Index (DDI) by the bioCaddie project team. It allows users to search and find data across different repositories.
  • General Code Repositories
    Use general code repositories such as Bitbucket, GitHub, and Source Forge to share code. CRAN is an R package archive network. Code Ocean is a research collaboration platform that lets users share computational environments and code in a Web browser. The Jupyter Notebook is an open-source web application for creating and sharing scientific data and text. It allows the user to create and share documents that contain live code, equations, visualizations, and text.
  • NIH Data Sharing Repositories
    This table lists NIH-supported data repositories that make data accessible for reuse.
  • Omics Discovery Index (OmicsDI)
    OmicsDI provides dataset discovery across a heterogeneous, distributed group of transcriptomics, genomics, proteomics, and metabolomics data resources spanning 16 repositories in three continents and six organizations, including both open- and controlled-access data resources.
  • Registry of Research Data Repositories ( is a global registry of research data repositories. It provides detailed information on more than 2,000 repositories to help researchers find the right one for their data. is a service of DataCite, a global non-profit organization that provides DOIs for research data.
  • SciCrunch
    SciCrunch is a data sharing and display platform designed to help communities create their own portals to provide access to research resources, data, literature, and tools. Hosted at the University of California, San Diego, SciCrunch is home to the Drug Design Data Resource, NIDDK Information Network (National Institute of Diabetes and Digestive and Kidney Diseases), the Neuroscience Information Network, and the Research Resource Identifiers (RRID) Portal which provides shared identifiers for citing research resources such as cell lines or antibodies in the literature. Users can search within and across these community portals.
  • United States Geological Survey (USGS) Data Repository Webpage
    This webpage provides useful information about preserving environmental data in data repositories.

Relevant Publications

Data Citations

  • Australian National Data Service (ANDS) Guide to Data Citation
    The ANDS Guide to Data Citation provides an overview on how to cite data, data citation styles and formats, using persistent identifiers (e.g., DOIs), and tracking data citations.
  • Ball A, Duke M. 2015. How to Cite Datasets and Link to Publications. Edinburgh: Digital Curation Centre.
  • DataCite – Cite Your Data
    DataCite is a leading global non-profit organization that provides persistent identifiers (DOIs) for research data. Their goal is to help the research community locate, identify, and cite research data with confidence. This page contains best practices for citing data. Properly citing data gives scholarly credit to data producers and facilitates discovery and reuse of the dataset.
  • Joint Declaration of Data Citation Principles – Final
    First released in November 2013 and finalized in February 2014, the declaration describes eight principles that emphasize the importance of data as evidence, the need to give credit to data contributors, the idea that cited data requires unique and persistent identifiers, and the belief that data citation should allow for human and machine access to the data and support verification and interoperability.
  • Making Your Code Citable
    This GitHub tutorial guides researchers on how make their work shared on GitHub citable by archiving the repository and assigning a DOI with the data archiving tool Zenodo.
    In addition, GitHub supports software citation based on the Citation File Format so researchers can easily be acknowledged for their contributions to software. By including a CITATION.cff file in their repository, a software citation widget will be added to the repository sidebar.

Relevant Publications

Research Data Management

Relevant Publications

Metadata Standards/Ontologies/Data Integration

  • ANDS Guide to Metadata
    This comprehensive guide provides a working-level view of the needs, issues, and processes around metadata collection and creation for research data.
  • Disciplinary Metadata
    The Digital Curation Center allows one to search for metadata standards, extensions, tools, and use cases by discipline (biology, earth science, general research data, physical science, and social science and humanities).
  • FAIRsharing Standards Database
    The standards in FAIRsharing are manually curated from a variety of sources, including BioPortal, MIBBI, and the Equator Network.
  • NCBO BioPortal
    BioPortal is a comprehensive repository of biomedical ontologies developed by the National Center for Biomedical Ontology, an international consortium providing ontological resources for the biomedical research community. BioPortal allows the user to browse or search ontologies, get ontology recommendations, explore mappings between ontology terms, and annotate textual biomedical data with ontology terms. The NIEHS Children’s Health Exposure Analysis Resource Ontology can be found in BioPortal.
  • NIH Common Data Elements (CDE) Repository
    The NIH CDE Repository provides access to structured human- and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and other purposes.
  • Open Biological and Biomedical Ontology (OBO) Foundry
    The OBO Foundry is a collective of ontology developers with the mission to develop a family of interoperable science-based ontologies for shared use across different biological and medical domains. They have published a set of normative principles for OBO Foundry ontologies.
  • PhenXToolkit (consensus measures for Phenotypes and eXposures)
    The PhenX Toolkit is a Web-based catalog of recommended, standard measures for phenotypes and exposures for use in biomedical research. Using protocols from the PhenX Toolkit allows investigators who are studying different diseases and conditions to collect data using the same methodologies - thus facilitating cross-study analysis.

Relevant Publications

Data Science Training

  • Big Data to Knowledge (BD2K)
    The BD2K Centers produced training and educational resources, including workshops, courses, webinars, lecture series, summer internships, and training programs.
  • ERUDITE (Educational Resource Discovery Index)
    Use ERUDITE to find educational resources (e.g., free modules, MOOCs, curricula, and webinars) and in-person or minimal-cost training opportunities (e.g., short courses) for data science.
  • Edison Data Science Framework
    The Edison Data Science Framework is a collection of documents that define the data science profession and competencies.
    The FOSTER portal is an e-learning platform providing training resources for those who need to know more about Open Science, or need to develop strategies and skills for implementing Open Science practices in their daily workflows. Available courses on Open Science include managing and sharing research data, best practices in open research, Open Science Software and workflows, data protection and ethics, and open licensing.
  • The Carpentries
    The Carpentries teach foundational coding, data science and computational skills to researchers worldwide. The Carpentries develop and teach in-person, interactive, two-day workshops using open-source lessons available on GitHub.

Relevant Publications

