Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Internet Explorer is no longer a supported browser.

This website may not display properly with Internet Explorer. For the best experience, please use a more recent browser such as the latest versions of Google Chrome, Microsoft Edge, and/or Mozilla Firefox. Thank you.

Your Environment. Your Health.

Data Sharing Resources

Superfund Research Program

This page provides information and resources on data sharing, data repositories, citing data, data integration, and data science training.

Data Sharing

  • FAIR Data Principles
    The FAIR Data Principles are a set of guiding principles published by FORCE11 that provide a framework for sharing data in a way that maximizes use and reuse by making data findable, accessible, interoperable, and reusable. Wilkinson et al., 2016 introduces the FAIR principles and rationale.
  • FAIR Self-Assessment Tool
    Created by the Australian Research Data Commons, this online tool allows one to assess the "FAIRness" of a dataset and provides suggestions for improvement and links to more information.
  • FAIRShake
    A software toolkit developed for the NIH Data Commons Pilot Phase Consortium (DCPPC) to enable the assessment of compliance of biomedical digital research objects with the FAIR guiding principles. FAIRShake was developed to facilitate the establishment of community-driven FAIR metrics and rubrics paired with manual and automated FAIR assessments.
  • FAIRsharing
    FAIRsharing.org provides a curated, informative, and educational resource on data and metadata standards and data policies, including those linked to specific funders and journals. Use it to identify and cite the standards, databases, or repositories that exist for your data and discipline.
  • How To Make Your Data FAIR
    Research guide produced by OpenAIRE, a European organization that seeks to shift scholarly communications toward openness and transparency. The guide includes a checklist to determine the "FAIRness" of one’s data.
  • National Library of Medicine Strategic Plan (2017 – 2027)
    The NLM strategic plan sets a course for data-driven discovery and health.
  • NIH Strategic Plan for Data Science
    NIH released its first Strategic Plan for Data Science in June 2018. It provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem.
  • SPARC Data Sharing Requirements by Federal Agency
    SPARC is a global coalition committed to making "open" the default status for research and education. Its website contains a resource for tracking, comparing, and understanding both current and future U.S. federal funder requirements for sharing research data.

Relevant Publications

2020

  • Boronow KE, Perovich LJ, Sweeney L, Yoo JS, Rudel RA, Brown P, Brody JG. 2020. Privacy risks of sharing data from environmental health studies. Environ Health Perspect 128(1):17008. [Abstract]
  • Boulware LE, Harris GB, Harewood P, Johnson FF, Maxson P, Bhavsar N, Blackwelder SS, Poley SS, Arnold K, Akindele B, Ferranti J, Lyn M. 2020. Democratizing health system data to impact social and environmental health contexts: a novel collaborative community data-sharing model. J Public Health (Oxf); doi:10.1093/pubmed/fdz171 [Online 9 January 2020]. [Abstract]
  • Heacock ML, Amolegbe SM, Skalla LA, Trottier BA, Carlin DJ, Henry HF, Lopez AR, Duncan CG, Lawler CP, Balshaw DM, Suk WA. 2020. Sharing SRP data to reduce environmentally associated disease and promote transdisciplinary research. Rev Environ Health; doi:10.1515/reveh-2019-0089 [Online 3 Mar 2020]. [Full Text]
  • Helzlsouer K, Meerzaman D, Taplin S, Dunn BK. 2020. Humanizing big data: recognizing the human aspect of big data. Front Oncol 10:186. [Full Text]
  • Hodgson S, Fecht D, Gulliver J, Iyathooray Daby H, Piel FB, Yip F, Strosnider H, Hansell A, Elliott P. 2020. Availability, access, analysis and dissemination of small-area data. Int J Epidemiol 49(Supplement_1):i4–i14. [Full Text]
  • Kaewkungwal J, Adams P, Sattabongkot J, Lie RK, Wendler D. 2020. Issues and challenges associated with data-sharing in low- and middle-income countries: perspectives of researchers in Thailand. Am J Trop Med Hyg; doi:10.4269/ajtmh.19-0651 [Online 11 May 2020]. [Full Text]
  • Lacey JV, Chung NT, Hughes P, Benbow JL, Duffy C, Savage KE, Spielfogel ES, Wang SS, Martinez ME, Chandra S. 2020. Insights from adopting a data commons approach for large-scale observational cohort studies: the California Teachers Study. Cancer Epidemiol Biomarkers Prev; doi:10.1158/1055-9965.EPI-19-0842 [Online 12 February 2020]. [Abstract]
  • Lin D, Crabtree J, Dillo I, Downs RR, Edmunds R, Giaretta D, De Giust M, L'Hours H, Hugo W, Jenkyns R, Khodiyar V, Martone ME, Mokrane M, Navale V, Petters J, Sierman B, Sokolova DV, Stockhause M, Westbrook J. 2020. The TRUST Principles for Digital Repositories. Sci Data 7(144). [Full Text]
  • Lobe M, Matthies F, Staubert S, Meineke FA, Winter A. 2020. Problems in FAIRifying medical datasets. Stud Health Technol Inform 270:392-396. [Full Text]
  • Mons B. 2020. Invest 5% of research funds in ensuring data are reusable. Nature 578(7796):491. [Full Text]
  • National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. [Full Text]
  • Perrier L, Blondal E, MacDonald H. 2020. The views, perspectives, and experiences of academic researchers with data sharing and reuse: a meta-synthesis. PLoS One 15(2):e0229182. [Full Text]
  • Phillips M, Molnar-Gabor F, Korbel JO, Thorogood A, Joly Y, Chalmers D, Townend D, Knoppers BM. 2020. Genomics: data sharing needs an international code of conduct. Nature 578(7793):31-33. [Full Text]
  • Promoting best practice in nucleotide sequence data sharing. 2020. Sci Data 7(1):152. [Full Text]
  • Rios RS, Zheng KI, Zheng MH. 2020. Data sharing during COVID-19 pandemic: what to take away. Expert Rev Gastroenterol Hepatol; doi:10.1080/17474124.2020.1815533 [Online 26 August 2020]. [Full Text]
  • Sinaci AA, Núñez-Benjumea FJ, Gencturk M, Jauer ML, Deserno T, Chronaki C, Cangioli G, Cavero-Barca C, Rodríguez-Pérez JM, Pérez-Pérez MM, Laleci Erturkmen GB, Hernández-Pérez T, Méndez-Rodríguez E, Parra-Calderón CL.  2020. From Raw Data to FAIR Data: The FAIRification Workflow for Health Research. Methods Inf Med 59(S 01):e21-e32. [Full Text]
  • Smith CD, Mennis J. 2020. Incorporating geographic information science and technology in response to the COVID-19 pandemic. Prev Chronic Dis 17:E58. [Full Text]
  • Solle D. 2020. Be FAIR to your data. Anal Bioanal Chem 412(17):3961-3965. [Full Text]
  • Tang L. 2020. FAIR your data. Nat Methods 17(2):127. [Full Text]
  • Tenopir C, Rice NM, Allard S, Baird L, Borycz J, Christian L, Grant B, Olendorf R, Sandusky RJ. 2020. Data sharing, management, use, and reuse: practices and perceptions of scientists worldwide. PLoS One 15(3):e0229003. [Full Text]
  • Thelwall M, Munafò M, Mas-Bleda A, Stuart E, Makita M, Weigert V, Keene C, Khan N, Drax K, Kousha K. 2020. Is useful research data usually shared? An investigation of genome-wide association study summary statistics. PLoS One 15(2):e0229578. [Full Text]
  • Udesky JO, Boronow KE, Brown P, Perovich LJ, Brody JG. 2020. Perceived risks, benefits, and interest in participating in environmental health studies that share personal exposure data: a U.S. survey of prospective participants. J Empir Res Hum Res Ethics; doi:10.1177/1556264620903595. [Online 15 February 2020]. [Abstract]
  • White T, Blok E, Calhoun VD. 2020. Data sharing and privacy issues in neuroimaging research: opportunities, obstacles, challenges, and monsters under the bed. Hum Brain Mapp; doi:10.1002/hbm.25120 [Online 4 July 2020]. [Full Text]

2019

  • Barba A, Dominguez S, Cobas C, Martinsen DP, Romain C, Rzepa HS, Seoane F. 2019. Workflows allowing creation of journal article supporting information and Findable, Accessible, Interoperable, and Reusable (FAIR)-enabled publication of spectroscopic data. ACS Omega 4(2):3280-3286. [Abstract]
  • Biomedical Data Translator Consortium. 2019. The biomedical data translator program: conception, culture, and community. Clin Transl Sci 12(2):91-94. [Full Text]
  • Carbon S, Champieux R, McMurry JA, Winfree L, Wyatt LR, Haendel MA. 2019. An analysis and metric of reusable data licensing practices for biomedical resources. PLoS One 14(3):e0213090. [Abstract]
  • Celi LA, Citi L, Ghassemi M, Pollard TJ. 2019. The PLOS ONE collection on machine learning in health and biomedicine: towards open code and open data. PLoS One 14(1):e0210232. [Abstract]
  • Christensen G, Dafoe A, Miguel E, Moore DA, Rose AK. 2019. A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLoS One 14(12):e0225883. [Abstract]
  • Clarke DJB, Wang L, Jones A, Wojciechowicz ML, Torre D, Jagodnik KM, Jenkins SL, McQuilton P, Flamholz Z, Silverstein MC, Schilder BM, Robasky K, Castillo C, Idaszak R, Ahalt SC, Williams J, Schurer S, Cooper DJ, de Miranda Azevedo R, Klenk JA, Haendel MA, Nedzel J, Avillach P, Shimoyama ME, Harris RM, Gamble M, Poten R, Charbonneau AL, Larkin J, Brown CT, Bonazzi VR, Dumontier MJ, Sansone SA, Ma'ayan A. 2019. FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Syst 9(5):417-421. [Abstract]
  • Fecho K, Ahalt SC, Arunachalam S, Champion J, Chute CG, Davis S, Gersing K, Glusman G, Hadlock J, Lee J, Pfaff E, Robinson M, Sid E, Ta C, Xu H, Zhu R, Zhu Q, Peden DB, Biomedical Data Translator Consortium. 2019. Sex, obesity, diabetes, and exposure to particulate matter among patients with severe asthma: scientific insights from a comparative analysis of open clinical data sources during a five-day hackathon. J Biomed Inform 100:103325. [Abstract]
  • Fothergill BT, Knight W, Stahl BC, Ulnicane I. 2019. Responsible data governance of neuroscience big data. Front Neuroinform 13:28. [Abstract]
  • Gaffney SG, Ad O, Smaga S, Schepartz A, Townsend JP. 2019. GEM-NET: lessons in multi-institution teamwork using collaboration software. ACS Cent Sci 5(7):1159-1169. [Abstract]
  • Jansen P, van den Berg L, van Overveld P, Boiten JW. 2019. Research data stewardship for healthcare professionals. In: Fundamentals of Clinical Data Science (Kubben P, Dumontier M, Dekker A, eds.). Cham, Switzerland: Springer. [Full Text]
  • Learned K, Durbin A, Currie R, Kephart ET, Beale HC, Sanders LM, Pfeil J, Goldstein TC, Salama SR, Haussler D, Vaske OM, Bjork IM. 2019. Barriers to accessing public cancer genomic data. Sci Data 6(1):98. [Full Text]
  • Li X, Fireman BH, Curtis JR, Arterburn DE, Fisher DP, Moyneur E, Gallagher M, Raebel MA, Nowell WB, Lagreid L, Toh S. 2019. Validity of privacy-protecting analytical methods that use only aggregate-level information to conduct multivariable-adjusted analysis in distributed data networks. Am J Epidemiol 188(4):709-723. [Abstract]
  • Madduri R, Chard K, D'Arcy M, Jung SC, Rodriguez A, Sulakhe D, Deutsch E, Funk C, Heavner B, Richards M, Shannon P, Glusman G, Price N, Kesselman C, Foster I. 2019. Reproducible big data science: a case study in continuous FAIRness. PLoS One 14(4):e0213013. [Abstract]
  • Oliveira JL, Trifan A, Bastiao Silva LA. 2019. EMIF Catalogue: a collaborative platform for sharing and reusing biomedical data. Int J Med Inform 126:35-45. [Abstract]
  • Perez-Riverol Y, Zorin A, Dass G, Vu MT, Xu P, Glont M, Vizcaino JA, Jarnuczak AF, Petryszak R, Ping P, Hermjakob H. 2019. Quantifying the impact of public omics data. Nat Commun 10(1):3512. [Abstract]
  • Polanin JR, Terzian M. 2019. A data-sharing agreement helps to increase researchers' willingness to share primary data: results from a randomized controlled trial. J Clin Epidemiol 106:60-69. [Abstract]
  • Popkin G. 2019. Data sharing and how it can benefit your scientific career. Nature 569(7756):445-447. [Abstract]
  • Psaty BM, Rich SS, Boerwinkle E. 2019. Innovation in genomic data sharing at the NIH. N Engl J Med 380(23):2192-2195. [Abstract]
  • Resnik DB, Morales M, Landrum R, Shi M, Minnier J, Vasilevsky NA, Champieux RE. 2019. Effect of impact factor and discipline on journal data sharing policies. Account Res 26(3):139-156. [Abstract]
  • Ruhamyankaka E, Brunk BP, Dorsey G, Harb OS, Helb DA, Judkins J, Kissinger JC, Lindsay B, Roos DS, San EJ, Stoeckert CJ, Zheng J, Tomko SS. 2019. ClinEpiDB: an open-access clinical epidemiology database resource encouraging online exploration of complex studies. Gates Open Res 3:1661. [Abstract]
  • Sansone SA, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, Thurston M. 2019.  FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol 2019;37(4):358-367. [Full Text]
  • Spoor S, Cheng CH, Sanderson LA, Condon B, Almsaeed A, Chen M, Bretaudeau A, Rasche H, Jung S, Main D, Bett K, Staton M, Wegrzyn JL, Feltus FA, Ficklin SP. 2019. Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases. Database (Oxford) 2019. [Abstract]
  • Staley J, Mazloom R, Lowe P, Newsum CT, Jaberi-Douraki M, Riviere J, Wyckoff GJ. 2019. Novel data sharing agreement to accelerate big data translational research projects in the one health sphere. Top Companion Anim Med 37:100367. [Abstract]
  • Vesteghem C, Brondum RF, Sonderkaer M, Sommer M, Schmitz A, Bodker JS, Dybkaer K, El-Galaly TC, Bogsted M. 2019. Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives. Brief Bioinform; doi:10.1093/bib/bbz044 [Online 29 June 2019]. [Abstract]
  • Villanueva AG, Cook-Deegan R, Koenig BA, Deverka PA, Versalovic E, McGuire AL, Majumder MA. 2019. Characterizing the biomedical data-sharing landscape. J Law Med Ethics 47(1):21-30. [Abstract]
  • Xu H, Zhang N. 2019. Privacy in health disparity research. Med Care 57(Suppl 6 Suppl 2):S172–S175. [Full Text]

2018

  • Boeckhout M, Zielhuis GA, Bredenoord AL. 2018. The FAIR guiding principles for data stewardship: fair enough? Eur J Hum Genet 26(7):931-936. [Abstract]
  • Brown AV, Campbell JD, Assefa T, Grant D, Nelson RT, Weeks NT, Cannon SB. 2018. Ten quick tips for sharing open genomic data. PLoS Comput Biol 14(12):e1006472. [Abstract]
  • Escribano N, Galicia D, Arino AH. 2018. The tragedy of the biodiversity data commons: a data impediment creeping nigher? Database (Oxford) 2018. [Abstract]
  • Gruning B, Chilton J, Koster J, Dale R, Soranzo N, van den Beek M, Goecks J, Backofen R, Nekrutenko A, Taylor J. 2018. Practical computational reproducibility in the life sciences. Cell Syst 6(6):631-635. [Abstract]
  • Holub P, Kohlmayer F, Prasser F, Mayrhofer MT, Schlunder I, Martin GM, Casati S, Koumakis L, Wutte A, Kozera L, Strapagiel D, Anton G, Zanetti G, Sezerman OU, Mendy M, Valik D, Lavitrano M, Dagher G, Zatloukal K, van Ommen GB, Litton JE. 2018. Enhancing reuse of data and biological material in medical research: from FAIR to FAIR-Health. Biopreserv Biobank 16(2):97-105. [Abstract]
  • Kitzes J, Turek D, Deniz F, eds. 2018. The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [Full Text]
  • National Academies of Sciences, Engineering, and Medicine. 2018. Open Science by Design: Realizing a Vision for 21st Century Research. Washington, DC: The National Academies Press. [Full Text]
  • Navale V, McAuliffe M. 2018. Long-term preservation of biomedical research data. F1000Res 7:1353. [Abstract]
  • Perkel JM. 2018. A toolkit for data transparency takes shape. Nature 560(7719):513-515. [Abstract]
  • Yaniv Z, Lowekamp BC, Johnson HJ, Beare R. 2018. SimpleITK image-analysis notebooks: a collaborative environment for education and reproducible research. J Digit Imaging 31(3):290-303. [Abstract]

2017

  • Ascoli GA, Maraver P, Nanda S, Polavaram S, Armananzas R. 2017. Win-win data sharing in neuroscience. Nat Methods 14(2):112-116. [Abstract]
  • Boland MR, Karczewski KJ, Tatonetti NP. 2017. Ten simple rules to enable multi-site collaborations through data sharing. PLoS Comput Biol 13(1):e1005278. [Abstract]
  • Fleming L, Tempini N, Gordon-Brown H, Nichols GL, Sarran C, Vineis P, Leonardi G, Golding B, Haines A, Kessel A, Murray V, Depledge M, Leonelli S. 2017. Big data in environment and human health. Oxford Research Encyclopedia of Environmental Science. [Full Text]
  • Hudson KL, Collins FS. 2017. The 21st Century Cures Act - a view from the NIH. N Engl J Med 376(2):111-113. [Abstract]
  • Jimenez RC, Kuzak M, Alhamdoosh M, Barker M, Batut B, Borg M, Capella-Gutierrez S, Chue Hong N, Cook M, Corpas M, Flannery M, Garcia L, Gelpi JL, Gladman S, Goble C, Gonzalez Ferreiro M, Gonzalez-Beltran A, Griffin PC, Gruning B, Hagberg J, Holub P, Hooft R, Ison J, Katz DS, Leskosek B, Lopez Gomez F, Oliveira LJ, Mellor D, Mosbergen R, Mulder N, Perez-Riverol Y, Pergl R, Pichler H, Pope B, Sanz F, Schneider MV, Stodden V, Suchecki R, Svobodova Varekova R, Talvik HA, Todorov I, Treloar A, Tyagi S, van Gompel M, Vaughan D, Via A, Wang X, Watson-Haigh NS, Crouch S. 2017. Four simple recommendations to encourage best practices in research software. F1000Res 6. [Abstract]
  • Majumder MA, Guerrini CJ, Bollinger JM, Cook-Deegan R, McGuire AL. 2017. Sharing data under the 21st Century Cures Act. Genet Med 19(12):1289-1294. [Abstract]
  • McIntosh LD, Juehne A, Vitale CRH, Liu X, Alcoser R, Lukas JC, Evanoff B. 2017. Repeat: a framework to assess empirical reproducibility in biomedical research. BMC Med Res Methodol 17(1):143. [Abstract]
  • Ohmann C, Banzi R, Canham S, Battaglia S, Matei M, Ariyo C, Becnel L, Bierer B, Bowers S, Clivio L, Dias M, Druml C, Faure H, Fenner M, Galvez J, Ghersi D, Gluud C, Groves T, Houston P, Karam G, Kalra D, Knowles RL, Krleza-Jeric K, Kubiak C, Kuchinke W, Kush R, Lukkarinen A, Marques PS, Newbigging A, O'Callaghan J, Ravaud P, Schlunder I, Shanahan D, Sitter H, Spalding D, Tudur-Smith C, van Reusel P, van Veen EB, Visser GR, Wilson J, Demotes-Mainard J. 2017. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open 7(12):e018647. [Abstract]
  • Olfson M, Wall MM, Blanco C. 2017. Incentivizing data sharing and collaboration in medical research–the S-Index. JAMA Psychiatry 74(1):5-6. [Abstract]
  • Thelwall M, Kousha K. 2017. Do journal data sharing mandates work? Life sciences evidence from Dryad. ASLIB J Inform Manag 69(1):36-45. [Abstract]

2016

  • McKiernan EC, Bourne PE, Brown CT, Buck S, Kenall A, Lin J, McDougall D, Nosek BA, Ram K, Soderberg CK, Spies JR, Thaney K, Updegrove A, Woo KH, Yarkoni T. 2016. How open science helps researchers succeed. Elife 5:e16800. [Abstract]
  • Piccolo SR, Frampton MB. 2016. Tools and techniques for computational reproducibility. Gigascience 5(1):30. [Abstract]
  • Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, Heroux MA, Ioannidis JP, Taufer M. 2016. Enhancing reproducibility for computational methods. Science 354(6317):1240-1241. [Abstract]
  • Warren E. 2016. Strengthening research through data sharing. N Engl J Med 375(5):401-403. [Abstract]
  • Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, 't Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018. [Abstract]

Data Repositories

  • Data Observation Network for Earth (DataONE)
    DataONE is a community-driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data. DataONE provides data usage and citation metrics for datasets.
  • DataCite
    Search the DataCite registry for datasets, software, images, and other research material.
  • DataMed
    DataMed is a prototype biomedical data search engine being developed for the NIH BD2K Data Discovery Index (DDI) by the bioCaddie project team. It allows users to search and find data across different repositories.
  • General Code Repositories
    Use general code repositories such as Bitbucket, GitHub, and Source Forge to share code. CRAN is an R package archive network. Code Ocean is a research collaboration platform that lets users share computational environments and code in a Web browser. The Jupyter Notebook is an open-source web application for creating and sharing scientific data and text. It allows the user to create and share documents that contain live code, equations, visualizations, and text.
  • NIH Data Sharing Repositories
    This table lists NIH-supported data repositories that make data accessible for reuse.
  • Omics Discovery Index (OmicsDI)
    OmicsDI provides dataset discovery across a heterogeneous, distributed group of transcriptomics, genomics, proteomics, and metabolomics data resources spanning 16 repositories in three continents and six organizations, including both open- and controlled-access data resources.
  • Registry of Research Data Repositories (re3data.org)
    re3data.org is a global registry of research data repositories. It provides detailed information on more than 2,000 repositories to help researchers find the right one for their data. re3data.org is a service of DataCite, a global non-profit organization that provides DOIs for research data.
  • SciCrunch
    SciCrunch is a data sharing and display platform designed to help communities create their own portals to provide access to research resources, data, literature, and tools. Hosted at the University of California, San Diego, SciCrunch is home to the Drug Design Data Resource, NIDDK Information Network (National Institute of Diabetes and Digestive and Kidney Diseases), the Neuroscience Information Network, and the Research Resource Identifiers (RRID) Portal which provides shared identifiers for citing research resources such as cell lines or antibodies in the literature. Users can search within and across these community portals.
  • United States Geological Survey (USGS) Data Repository Webpage
    This webpage provides useful information about preserving environmental data in data repositories.

Relevant Publications

2020

  • Bogue MA, Philip VM, Walton DO, Grubb SC, Dunn MH, Kolishovski G, Emerson J, Mukherjee G, Stearns T, He H, Sinha V, Kadakkuzha B, Kunde-Ramamoorthy G, Chesler EJ. 2020. Mouse phenome database: a data repository and analysis suite for curated primary mouse phenotype data. Nucleic Acids Res 48(D1):D716-D723. [Full Text]
  • Jurburg SD, Konzack M, Eisenhauer N, Heintz-Buschart A. 2020. The archives are half-empty: an assessment of the availability of microbial community sequencing data. Commun Biol 3(1):474. [Full Text]

2019

  • Amid C, Pakseresht N, Silvester N, Jayathilaka S, Lund O, Dynovski LD, Pataki BA, Visontai D, Xavier BB, Alako BTF, Belka A, Cisneros JLB, Cotten M, Haringhuizen GB, Harrison PW, Hoper D, Holt S, Hundahl C, Hussein A, Kaas RS, Liu X, Leinonen R, Malhotra-Kumar S, Nieuwenhuijse DF, Rahman N, Dos S Ribeiro C, Skiby JE, Schmitz D, Steger J, Szalai-Gindl JM, Thomsen MCF, Caccio SM, Csabai I, Kroneman A, Koopmans M, Aarestrup F, Cochrane G. 2019. The COMPARE data hubs Database (Oxford). 2019(2019):baz136. [Full Text]
  • Banzi R, Canham S, Kuchinke W, Krleza-Jeric K, Demotes-Mainard J, Ohmann C. 2019. Evaluation of repositories for sharing individual-participant data from clinical studies. Trials 20(1):169. [Abstract]
  • Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, Christie C, Dalenberg K, Duarte JM, Dutta S, Feng Z, Ghosh S, Goodsell DS, Green RK, Guranovic V, Guzenko D, Hudson BP, Kalro T, Liang Y, Lowe R, Namkoong H, Peisach E, Periskova I, Prlic A, Randle C, Rose A, Rose P, Sala R, Sekharan M, Shao C, Tan L, Tao YP, Valasatava Y, Voigt M, Westbrook J, Woo J, Yang H, Young J, Zhuravleva M, Zardecki C. 2019. RCSB protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 47(D1):D464-D474. [Full Text]
  • Grossman RL. 2019. Data lakes, clouds, and commons: a review of platforms for analyzing and sharing genomic data. Trends Genet 35(3):223-234. [Abstract]
  • Laulederkind SJF, Hayman GT, Wang SJ, Hoffman MJ, Smith JR, Bolton ER, De Pons J, Tutaj MA, Tutaj M, Thota J, Dwinell MR, Shimoyama M. 2019. Rat genome databases, repositories, and tools. Methods Mol Biol 2018:71-96. [Abstract]
  • Ma J, Chen T, Wu S, Yang C, Bai M, Shu K, Li K, Zhang G, Jin Z, He F, Hermjakob H, Zhu Y. 2019. iProX: an integrated proteome resource. Nucleic Acids Res 47(D1):D1211-D1217. [Full Text]
  • UniProt Consortium. 2019. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506-D515. [Full Text]
  • Saldanha IJ, Smith BT, Ntzani E, Jap J, Balk EM, Lau J. 2019. The Systematic Review Data Repository (SRDR): descriptive characteristics of publicly available data and opportunities for research. Syst Rev 8(1):334. [Abstract]

2018

  • Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. 2018. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 25(3):300-308. [Abstract]
  • Kleywegt GJ, Velankar S, Patwardhan A. 2018. Structural biology data archiving - where we are and what lies ahead. FEBS Lett 592(12):2153-2167. [Abstract]

2017

  • Ohno-Machado L, Sansone SA, Alter G, Fore I, Grethe J, Xu H, Gonzalez-Beltran A, Rocca-Serra P, Gururaj AE, Bell E, Soysal E, Zong N, Kim HE. 2017. Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49(6):816-819. [Abstract]
  • Perez-Riverol Y, Bai M, da Veiga Leprevost F, Squizzato S, Park YM, Haug K, Carroll AJ, Spalding D, Paschall J, Wang M, Del-Toro N, Ternent T, Zhang P, Buso N, Bandeira N, Deutsch EW, Campbell DS, Beavis RC, Salek RM, Sarkans U, Petryszak R, Keays M, Fahy E, Sud M, Subramaniam S, Barbera A, Jimenez RC, Nesvizhskii AI, Sansone SA, Steinbeck C, Lopez R, Vizcaino JA, Ping P, Hermjakob H. 2017. Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol 35(5):406-409. [Abstract]

Data Citations

  • Australian National Data Service (ANDS) Guide to Data Citation
    The ANDS Guide to Data Citation provides an overview on how to cite data, data citation styles and formats, using persistent identifiers (e.g., DOIs), and tracking data citations.
  • Ball A, Duke M. 2015. How to Cite Datasets and Link to Publications. Edinburgh: Digital Curation Centre.
  • DataCite – Cite Your Data
    DataCite is a leading global non-profit organization that provides persistent identifiers (DOIs) for research data. Their goal is to help the research community locate, identify, and cite research data with confidence. This page contains best practices for citing data. Properly citing data gives scholarly credit to data producers and facilitates discovery and reuse of the dataset.
  • Joint Declaration of Data Citation Principles – Final
    First released in November 2013 and finalized in February 2014, the declaration describes eight principles that emphasize the importance of data as evidence, the need to give credit to data contributors, the idea that cited data requires unique and persistent identifiers, and the belief that data citation should allow for human and machine access to the data and support verification and interoperability.

Relevant Publications

2020

  • Buneman P, Christie G, Davies JA, Dimitrellou R, Harding SD, Pawson AJ, Sharman JL, Wu Y. 2020. Why data citation isn't working, and what to do about it. Database (Oxford) 2020(2020):baaa022. [Full Text]

2019

  • Fenner M, Crosas M, Grethe JS, Kennedy D, Hermjakob H, Rocca-Serra P, Durand G, Berjon R, Karcher S, Martone M, Clark T. 2019. A data citation roadmap for scholarly data repositories. Sci Data 6(1):28. [Abstract]

2018

  • Cousijn H, Kenall A, Ganley E, Harrison M, Kernohan D, Lemberger T, Murphy F, Polischuk P, Taylor S, Martone M, Clark T. 2018. A data citation roadmap for scientific publishers. Sci Data 5:180259. [Abstract]
  • Lemberger T. 2018. Data citation: what, when, why? Mol Syst Biol 14(12):e8783. [Abstract]

2016

  • Honor LB, Haselgrove C, Frazier JA, Kennedy DN. 2016. Data citation in neuroimaging: proposed best practices for data identification and attribution. Front Neuroinform 10:34. [Abstract]

Research Data Management

Relevant Publications

2020

  • Hay J, Troup E, Clark I, Pietsch J, Zielinski T, Millar A. 2020. PyOmeroUpload: a Python toolkit for uploading images and metadata to OMERO. Wellcome Open Res 5:96. [Full Text]
  • Lepperod ME, Dragly SA, Buccino AP, Mobarhan MH, Malthe-Sorenssen A, Hafting T, Fyhn M. 2020. Experimental Pipeline (Expipe): a lightweight data management platform to simplify the steps from experiment to data analysis. Front Neuroinform 14:30. [Full Text]
  • Vuorre M, Crump MJC. 2020. Sharing and organizing research products as R packages. Behav Res Methods; doi:10.3758/s13428-020-01436-x [Online 1 September 2020]. [Full Text]

2019

  • Emam I, Elyasigomari V, Matthews A, Pavlidis S, Rocca-Serra P, Guitton F, Verbeeck D, Grainger L, Borgogni E, Del Giudice G, Saqi M, Houston P, Guo Y. 2019. PlatformTM, a standards-based data custodianship platform for translational medicine research. Sci Data 6(1):149. [Full Text]
  • Miksa T, Simms S, Mietchen D, Jones S. 2019. Ten principles for machine-actionable data management plans. PLoS Comput Biol 15(3):e1006750. [Abstract]

2018

  • Schiermeier Q. 2018. Data management made simple. Nature 555(7696):403-405. [Abstract]

2017

  • Griffin PC, Khadake J, LeMay KS, Lewis SE, Orchard S, Pask A, Pope B, Roessner U, Russell K, Seemann T, Treloar A, Tyagi S, Christiansen JH, Dayalan S, Gladman S, Hangartner SB, Hayden HL, Ho WWH, Keeble-Gagnere G, Korhonen PK, Neish P, Prestes PR, Richardson MF, Watson-Haigh NS, Wyres KL, Young ND, Schneider MV. 2017. Best practice data life cycle approaches for the life sciences. F1000Res 6:1618. [Full Text]

2015

  • Michener WK. 2015. Ten simple rules for creating a good data management plan. PLoS Comput Biol 11(10):e1004525. [Abstract]

Metadata Standards/Ontologies/Data Integration

  • ANDS Guide to Metadata
    This comprehensive guide provides a working-level view of the needs, issues, and processes around metadata collection and creation for research data.
  • Disciplinary Metadata
    The Digital Curation Center allows one to search for metadata standards, extensions, tools, and use cases by discipline (biology, earth science, general research data, physical science, and social science and humanities).
  • FAIRsharing Standards Database
    The standards in FAIRsharing are manually curated from a variety of sources, including BioPortal, MIBBI, and the Equator Network.
  • NCBO BioPortal
    BioPortal is a comprehensive repository of biomedical ontologies developed by the National Center for Biomedical Ontology, an international consortium providing ontological resources for the biomedical research community. BioPortal allows the user to browse or search ontologies, get ontology recommendations, explore mappings between ontology terms, and annotate textual biomedical data with ontology terms. The NIEHS Children’s Health Exposure Analysis Resource Ontology can be found in BioPortal.
  • NIH Common Data Elements (CDE) Repository
    The NIH CDE Repository provides access to structured human- and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and other purposes.
  • Open Biological and Biomedical Ontology (OBO) Foundry
    The OBO Foundry is a collective of ontology developers with the mission to develop a family of interoperable science-based ontologies for shared use across different biological and medical domains. They have published a set of normative principles for OBO Foundry ontologies.
  • PhenXToolkit (consensus measures for Phenotypes and eXposures)
    The PhenX Toolkit is a Web-based catalog of recommended, standard measures for phenotypes and exposures for use in biomedical research. Using protocols from the PhenX Toolkit allows investigators who are studying different diseases and conditions to collect data using the same methodologies - thus facilitating cross-study analysis.

Relevant Publications

2020

  • Alter G, Gonzalez-Beltran A, Ohno-Machado L, Rocca-Serra P. 2020. The Data Tags Suite (DATS) model for discovering data access and use requirements. Gigascience 9(2). [Abstract]
  • Bernasconi A, Canakoglu A, Masseroli M, Ceri S. 2020. The road towards data integration in human genomics: players, steps and interactions. Brief Bioinform; doi:10.1093/bib/bbaa080 [Online 4 June 2020]. [Abstract]
  • Canzler S, Schor J, Busch W, Schubert K, Rolle-Kampczyk UE, Seitz H, Kamp H, von Bergen M, Buesen R, Hackermuller J. 2020. Prospects and challenges of multi-omics data integration in toxicology. Arch Toxicol; doi:10.1007/s00204-020-02656-y [Online 8 February 2020]. [Abstract]
  • Campbell DL, Thessen AE, Ries L. 2020. A novel curation system to facilitate data integration across regional citizen science survey programs. PeerJ 8:e9219. [Full Text]
  • Elghafari A, Finkelstein J. 2020. Introducing an ontology-driven pipeline for the identification of common data elements. Stud Health Technol Inform 272:379-382. [Full Text]
  • Fan Y, Chen M, Zhu Q, Wang W. 2020. Inferring disease-associated microbes based on multi-data integration and network consistency projection. Front Bioeng Biotechnol 8:831. [Full Text]
  • Jeliazkova N, Jeliazkov V. 2020. Making big data available: integrating technologies for toxicology applications. In: Big Data in Predictive Toxicology (Neagu D, Richarz A-N, eds.). Cambridge: Royal Society of Chemistry, 166-184 . [Abstract]
  • Konopka T, Smedley D. 2020. Incremental data integration for tracking genotype-disease associations. PLoS Comput Biol 16(1):e1007586. [Abstract]
  • Li J, Yin Y, Zhang M, Cui J, Zhang Z, Zhang Z, Sun D. 2020. GsmPlot: a web server to visualize epigenome data in NCBI. BMC Bioinformatics 21(1):55. [Abstract]
  • Meyer DE, Bailin SC, Vallero D, Egeghy PP, Liu SV, Cohen Hubal EA. 2020. Enhancing life cycle chemical exposure assessment through ontology modeling. Sci Total Environ 712:136263. [Abstract]
  • Reid RW, Ferrier JW, Jay JJ. 2020. Automated gene data integration with Databio. BMC Res Notes 13(1):195. [Full Text]
  • Schriml LM, Chuvochina M, Davies N, Eloe-Fadrosh EA, Finn RD, Hugenholtz P, Hunter CI, Hurwitz BL, Kyrpides NC, Meyer F, Mizrachi IK, Sansone SA, Sutton G, Tighe S, Walls R. 2020. COVID-19 pandemic reveals the peril of ignoring metadata standards. Sci Data 7(1):188. [Full Text]
  • Subramanian I, Verma S, Kumar S, Jere A, Anamika K. 2020. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14:1177932219899051. [Full Text]
  • Thapa I, Ali H. 2020. A multiomics graph database system for biological data integration and cancer informatics. J Comput Biol; doi:10.1089/cmb.2020.0231 [Online 12 August 2020]. [Abstract]
  • Thomas S, Lichtenberg T, Dang K, Fitzsimons M, Grossman RL, Kundra R, Lavery JA, Lenoue-Newton ML, Panageas KS, Sawyers C, Schultz ND, Sirintrapun SJ, Topaloglu U, Welch A, Yu T, Zehir A, Gardos S. 2020. Linked Entity Attribute Pair (LEAP): a harmonization framework for data pooling. JCO Clin Cancer Inform 4:691-699. [Full Text]
  • Zhao Y, Li L, Caffo BS. 2020. Multimodal neuroimaging data integration and pathway analysis. Biometrics; doi:10.1111/biom.13351 [Online 13 August 2020]. [Abstract]

2019

  • Brown J, Phillips AR, Lewis DA, Mans MA, Chang Y, Tanguay RL, Peterson ES, Waters KM, Tilton SC. 2019. Bioinformatics Resource Manager: a systems biology web tool for microRNA and omics data integration. BMC Bioinformatics 20(1):255. [Abstract]
  • Bucher E, Claunch CJ, Hee D, Smith RL, Devlin K, Thompson W, Korkola JE, Heiser LM. 2019. Annot: a Django-based sample, reagent, and experiment metadata tracking system. BMC Bioinformatics 20(1):542. [Abstract]
  • Buendia P, Bradley RM, Taylor TJ, Schymanski EL, Patti GJ, Kabuka MR. 2019. Ontology-based metabolomics data integration with quality control. Bioanalysis 11(12):1139-1155. [Abstract]
  • Cooper DJ, Schurer S. 2019. Improving the utility of the Tox21 dataset by deep metadata annotations and constructing reusable benchmarked chemical reference signatures. Molecules 24(8):1604. [Abstract]
  • Dorea FC, Vial F, Hammar K, Lindberg A, Lambrix P, Blomqvist E, Revie CW. 2019. Drivers for the development of an Animal Health Surveillance Ontology (AHSO). Prev Vet Med 166:39-48. [Abstract]
  • Falster DS, FitzJohn RG, Pennell MW, Cornwell WK. 2019. Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R. Gigascience 8(5). [Abstract]
  • Fillinger S, de la Garza L, Peltzer A, Kohlbacher O, Nahnsen S. 2019. Challenges of big data integration in the life sciences. Anal Bioanal Chem 411(26):6791-6800. [Abstract]
  • Kourou KD, Pezoulas VC, Georga EI, Exarchos TP, Tsanakas P, Tsiknakis M, Varvarigou T, De Vita S, Tzioufas A, Fotiadis DI. 2019. Cohort harmonization and integrative analysis from a biomedical engineering perspective. IEEE Rev Biomed Eng 12:303-318. [Abstract]
  • Macklin P. 2019. Key challenges facing data-driven multicellular systems biology. Gigascience 8(10). [Abstract]
  • Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. 2019. Machine learning and integrative analysis of biomedical big data. Genes (Basel) 10(2):E87. [Abstract]
  • Navale V, Ji M, Vovk O, Misquitta L, Gebremichael T, Garcia A, Fann Y, McAuliffe M. 2019. Development of an informatics system for accelerating biomedical research. F1000Res 8:1430. [Full Text]
  • Pala D, Pagan J, Parimbelli E, Rocca MT, Bellazzi R, Casella V. 2019. Spatial enablement to support environmental, demographic, socioeconomics, and health data integration and analysis for big cities: a case study with asthma hospitalizations in New York City. Front Med 6:84. [Abstract]
  • Peng C, Goswami P. 2019. Meaningful integration of data from heterogeneous health services and home environment based on ontology. Sensors (Basel) 19(8):E1747. [Abstract]
  • Schymanski EL, Baker NC, Williams AJ, Singh RR, Trezzi JP, Wilmes P, Kolber PL, Kruger R, Paczia N, Linster CL, Balling R. 2019. Connecting environmental exposure and neurodegeneration using cheminformatics and high resolution mass spectrometry: potential and challenges. Environ Sci Process Impacts 21(9):1426-1445. [Abstract]
  • Siegele DA, LaBonte SA, Wu PI, Chibucos MC, Nandendla S, Giglio MG, Hu JC. 2019. Phenotype annotation with the Ontology of Microbial Phenotypes (OMP). J Biomed Semantics 10(1):13. [Abstract]
  • Sima AC, Stockinger K, de Farias TM, Gil M. 2019. Semantic integration and enrichment of heterogeneous biological databases. In: Evolutionary Genomics (Anisimova M, ed.). New York, NY: Humana. [Full Text]
  • Tang YA, Pichler K, Fullgrabe A, Lomax J, Malone J, Munoz-Torres MC, Vasant DV, Williams E, Haendel M. 2019. Ten quick tips for biocuration. PLoS Comput Biol 15(5):e1006906. [Abstract]
  • T'Joen V, Vaneeckhaute L, Priem S, Van Woensel S, Bekaert S, Berneel E, Van Der Straeten C. 2019. Rationalized development of a campus-wide cell line dataset for implementation in the Biobank LIMS system at Bioresource Center Ghent. Front Med (Lausanne) 6:137. [Abstract]
  • Wang RL, Edwards S, Ives C. 2019. Ontology-based semantic mapping of chemical toxicities. Toxicology 412:89-100. [Abstract]

2018

  • Abburu S. 2018. Ontology driven cross-linked domain data integration and spatial semantic multi criteria query system for geospatial public health. Int J Semant Web Inf Syst 14(3):1-30. [Abstract]
  • Baker N, Boobis A, Burgoon L, Carney E, Currie R, Fritsche E, Knudsen T, Laffont M, Piersma AH, Poole A, Schneider S, Daston G. 2018. Building a developmental toxicity ontology. Birth Defects Res 110(6):502-518. [Abstract] [ECETOC Open Access Report]
  • Cooper L, Meier A, Laporte MA, Elser JL, Mungall C, Sinn BT, Cavaliere D, Carbon S, Dunn NA, Smith B, Qu B, Preece J, Zhang E, Todorovic S, Gkoutos G, Doonan JH, Stevenson DW, Arnaud E, Jaiswal P. 2018. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res 46(D1):D1168-D1180. [Abstract]
  • Fairchild G, Tasseff B, Khalsa H, Generous N, Daughton AR, Velappan N, Priedhorsky R, Deshpande A. 2018. Epidemiological data challenges: planning for a more robust future through data standards. Front Public Health 6:336. [Abstract]
  • Haynes D, Jokela A, Manson S. 2018. IPUMS-Terra: integrated big heterogeneous spatio-temporal data analysis system. J Geogr Syst 20(4):343-361. [Abstract]
  • He Y, Xiang Z, Zheng J, Lin Y, Overton JA, Ong E. 2018. The eXtensible Ontology Development (XOD) principles and tool implementation to support ontology interoperability. J Biomed Semantics 9(1):3. [Abstract]
  • Huser V, Amos L. 2018. Analyzing real-world use of research common data elements. AMIA Annu Symp Proc 2018:602-608. [Abstract]
  • National Academies of Sciences, Engineering, and Medicine. 2018. Informing Environmental Health Decisions Through Data Integration: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. [Full Text]

2017

  • Alshahrani M, Khan MA, Maddouri O, Kinjo AR, Queralt-Rosinach N, Hoehndorf R. 2017. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 33(17):2723-2730. [Abstract]
  • Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. 2017. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res 45(D1):D972-D978 [Abstract]
  • Malinowski AK, Ananth CV, Catalano P, Hines EP, Kirby RS, Klebanoff MA, Mulvihill JJ, Simhan H, Hamilton CM, Hendershot TP, Phillips MJ, Kilpatrick LA, Maiese DR, Ramos EM, Wright RJ, Dolan SM; PhenX Pregnancy Working Group. 2017. Research standardization tools: pregnancy measures in the PhenX Toolkit. Am J Obstet Gynecol 217(3):249-262. [Abstract]

2016

  • Mattingly CJ, Boyles R, Lawler CP, Haugen AC, Dearry A, Haendel M. 2016. Laying a community-based foundation for data-driven semantic standards in environmental health sciences. Environ Health Perspect 124(8):1136-1140. [Abstract]
  • Rocca-Serra P, Salek RM, Arita M, Correa E, Dayalan S, Gonzalez-Beltran A, Ebbels T, Goodacre R, Hastings J, Haug K, Koulman A, Nikolski M, Oresic M, Sansone SA, Schober D, Smith J, Steinbeck C, Viant MR, Neumann S. 2016. Data standards can boost metabolomics research, and if there is a will, there is a way. Metabolomics 12:14. [Abstract]

Data Science Training

  • Big Data to Knowledge (BD2K)
    The BD2K Centers produced training and educational resources, including workshops, courses, webinars, lecture series, summer internships, and training programs.
  • ERUDITE (Educational Resource Discovery Index)
    Use ERUDITE to find educational resources (e.g., free modules, MOOCs, curricula, and webinars) and in-person or minimal-cost training opportunities (e.g., short courses) for data science.
  • Edison Data Science Framework
    The Edison Data Science Framework is a collection of documents that define the data science profession and competencies.
  • FOSTER
    The FOSTER portal is an e-learning platform providing training resources for those who need to know more about Open Science, or need to develop strategies and skills for implementing Open Science practices in their daily workflows. Available courses on Open Science include managing and sharing research data, best practices in open research, Open Science Software and workflows, data protection and ethics, and open licensing.
  • The Carpentries
    The Carpentries teach foundational coding, data science and computational skills to researchers worldwide. The Carpentries develop and teach in-person, interactive, two-day workshops using open-source lessons available on GitHub.

Relevant Publications

2019

  • Attwood TK, Blackford S, Brazas MD, Davies A, Schneider MV. 2019. A global perspective on evolving bioinformatics and data science training needs. Brief Bioinform 20(2):398-404. [Abstract]
  • Grabowski P, Rappsilber J. 2019. A primer on data analytics in functional genomics: how to move from data to insight? Trends Biochem Sci 44(1):21-32. [Abstract]
  • Mendez KM, Pritchard L, Reinke SN, Broadhurst DI. 2019. Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing. Metabolomics 15(10):125. [Full Text]

2018

  • Carey MA, Papin JA. 2018. Ten simple rules for biologists learning to program. PLoS Comput Biol 14(1):e1005871. [Abstract]
  • National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. [Full Text]
  • Toelch U, Ostwald D. 2018. Digital open science–teaching digital tools for reproducible and transparent research. PLoS Biol 16(7):e2006022. [Abstract]
  • Van Horn JD, Fierro L, Kamdar J, Gordon J, Stewart C, Bhattrai A, Abe S, Lei X, O'Driscoll C, Sinha A, Jain P, Burns G, Lerman K, Ambite JL. 2018. Democratizing data science through data science training. Pac Symp Biocomput 23:292-303. [Abstract]

2017

  • Dunn MC, Bourne PE. 2017. Building the biomedical data science workforce. PLoS Biol 15(7):e2003082. [Abstract]
Back
to Top