Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Your Environment. Your Health.

Data Sharing Resources

Superfund Research Program

This page provides information and resources on data sharing, data repositories, citing data, data integration, and data science training.

Data Sharing

  • FAIR Data Principles
    The FAIR Data Principles are a set of guiding principles published by FORCE11 that provide a framework for sharing data in a way that maximizes use and reuse by making data findable, accessible, interoperable, and reusable. Wilkinson et al., 2016 introduces the FAIR principles and rationale.
  • FAIR Self-Assessment Tool
    Created by the Australian Research Data Commons, this online tool allows one to assess the "FAIRness" of a dataset and provides suggestions for improvement and links to more information.
  • FAIRShake
    A software toolkit developed for the NIH Data Commons Pilot Phase Consortium (DCPPC) to enable the assessment of compliance of biomedical digital research objects with the FAIR guiding principles. FAIRShake was developed to facilitate the establishment of community-driven FAIR metrics and rubrics paired with manual and automated FAIR assessments.
  • FAIRsharing
    FAIRsharing.org provides a curated, informative, and educational resource on data and metadata standards and data policies, including those linked to specific funders and journals. Use it to identify and cite the standards, databases, or repositories that exist for your data and discipline.
  • How To Make Your Data FAIR
    Research guide produced by OpenAIRE, a European organization that seeks to shift scholarly communications toward openness and transparency. The guide includes a checklist to determine the "FAIRness" of one’s data.
  • National Library of Medicine Strategic Plan (2017 – 2027)
    The NLM strategic plan sets a course for data-driven discovery and health.
  • NIH Strategic Plan for Data Science
    NIH released its first Strategic Plan for Data Science in June 2018. It provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem.
  • SPARC Data Sharing Requirements by Federal Agency
    SPARC is a global coalition committed to making "open" the default status for research and education. Its website contains a resource for tracking, comparing, and understanding both current and future U.S. federal funder requirements for sharing research data.

Relevant Publications

2021

  • Bhattacharya S, Hu Z, Butte AJ. 2021. Opportunities and challenges in democratizing immunology datasets. Front Immunol 12:647536. [Full Text]
  • Bittremieux W, Adams C, Laukens K, Dorrestein PC, Bandeira N. 2021. Open science resources for the mass spectrometry-based analysis of SARS-CoV-2. J Proteome Res 20(3):1464-1475. [Full Text]
  • Butte AJ. 2021. Trials and tribulations-11 reasons why we need to promote clinical trials data sharing. JAMA Netw Open 4(1):e2035043. [Full Text]
  • Caufield JH, Fu J, Wang D, Guevara-Gonzalez V, Wang W, Ping P. 2021. A second look at FAIR in proteomic investigations. J Proteome Res; doi:10.1021/acs.jproteome.1c00177 [Online 13 March 2021]. [Abstract]
  • Danchev V, Min Y, Borghi J, Baiocchi M, Ioannidis JPA. 2021. Evaluation of data sharing after implementation of the international committee of medical journal editors’ data sharing statement requirement. JAMA Netw Open 4(1):e2033972. [Full Text]
  • de Macena Sobreira NL, Hamosh A. 2021. Next-generation sequencing and the evolution of data sharing. Am J Med Genet A; doi:10.1002/ajmg.a.62239 [Online 7 May 2021]. [Abstract]
  • Devriendt T, Shabani M, Borry P. 2021. Data sharing in biomedical sciences: a systematic review of incentives. Biopreserv Biobank; doi:10.1089/bio.2020.0037 [Online 11 February 2021]. [Abstract]
  • Kinkade D and Shepherd A. 2021. Geoscience data publication: Practices and perspectives on enabling the FAIR guiding principles. Geoscience Data Journal. [Full Text]
  • Lochman JE. 2021. Open science and intervention research: a program developer's and researcher's perspective on issues and concerns. Prev Sci; doi:10.1007/s11121-021-01219-6 [Online 2 March 2021]. [Abstract]
  • Mahmud M, Kaiser MS, McGinnity TM, Hussain A. 2021. Deep learning in mining biological data. Cognit Comput; doi: 10.1007/s12559-020-09773-x [Online 5 January 2021]. [Full Text]
  • McGuinness LA, Sheppard AL. 2021. A descriptive analysis of the data availability statements accompanying medRxiv preprints and a comparison with their published counterparts. PLoS One 16(5):e0250887. [Full Text]
  • Misra BB. 2021. Advances in high resolution GC-MS technology: a focus on the application of GC-Orbitrap-MS in metabolomics and exposomics for FAIR practices. Anal Methods; doi:10.1039/d1ay00173f [Online 14 May 2021]. [Abstract]
  • Richir J, Bray S, McAleese T, Watson GJ. 2021. Data on elemental concentrations in marine sediments from the South and South West of England. Data Brief 35:106901. [Full Text]
  • Serghiou S, Contopoulos-Ioannidis DG, Boyack KW, Riedel N, Wallach JD, Ioannidis JPA. 2021. Assessment of transparency indicators across the biomedical literature: how open is open? PLoS Biol 19(3):e3001107. [Full Text]
  • Sixto-Costoya A, Lucas-Dominguez R, Aleixandre-Benavent R, Vidal-Infer A. 2021. Is sharing datasets the answer to the new challenges of reproductive biology research? Reprod Sci 28(4):1023-1025. [Full Text]
  • Zuo X, Chen Y, Ohno-Machado L, Xu H. 2021. How do we share data in COVID-19 research? A systematic review of COVID-19 datasets in PubMed Central Articles. Brief Bioinform 22(2):800-811. [Full Text]

2020

  • Birkenbihl C, Salimi Y, Domingo-Fernandez D, Lovestone S, AddNeuroMed consortium, Frohlich H, Hofmann-Apitius M, Japanese Alzheimer's Disease Neuroimaging Initiative, Alzheimer's Disease Neuroimaging Initiative. 2020. Evaluating the Alzheimer's disease data landscape. Alzheimers Dement (N Y) 6(1):e12102. [Full Text]
  • Boronow KE, Perovich LJ, Sweeney L, Yoo JS, Rudel RA, Brown P, Brody JG. 2020. Privacy risks of sharing data from environmental health studies. Environ Health Perspect 128(1):17008. [Abstract]
  • Boulware LE, Harris GB, Harewood P, Johnson FF, Maxson P, Bhavsar N, Blackwelder SS, Poley SS, Arnold K, Akindele B, Ferranti J, Lyn M. 2020. Democratizing health system data to impact social and environmental health contexts: a novel collaborative community data-sharing model. J Public Health (Oxf); doi:10.1093/pubmed/fdz171 [Online 9 January 2020]. [Abstract]
  • Dorne JLCM, Richardson J, Livaniou A, Carnesecchi E, Ceriani L, Baldin R, Kovarich S, Pavan M, Saouter E, Biganzoli F, Pasinato L, Zare Jeddi M, Robinson TP, Kass GEN, Liem AKD, Toropov AA, Toropova AP, Yang C, Tarkhov A, Georgiadis N, Di Nicola MR, Mostrag A, Verhagen H, Roncaglioni A, Benfenati E, Bassan A. 2020. EFSA's OpenFoodTox: an open source toxicological database on chemicals in food and feed and its future developments. Environ Int 146:106293. [Full Text]
  • Gao F, Tao L, Huang Y, Shu Z. 2020. Management and data sharing of COVID-19 pandemic information. Biopreserv Biobank 18(6):570-580. [Full Text]
  • Hauptmann E. 2020. Why they shared: recovering early arguments for sharing social scientific data. Sci Context 33(2):101-119. [Abstract]
  • Heacock ML, Amolegbe SM, Skalla LA, Trottier BA, Carlin DJ, Henry HF, Lopez AR, Duncan CG, Lawler CP, Balshaw DM, Suk WA. 2020. Sharing SRP data to reduce environmentally associated disease and promote transdisciplinary research. Rev Environ Health; doi:10.1515/reveh-2019-0089 [Online 3 Mar 2020]. [Full Text]
  • Helzlsouer K, Meerzaman D, Taplin S, Dunn BK. 2020. Humanizing big data: recognizing the human aspect of big data. Front Oncol 10:186. [Full Text]
  • Hodgson S, Fecht D, Gulliver J, Iyathooray Daby H, Piel FB, Yip F, Strosnider H, Hansell A, Elliott P. 2020. Availability, access, analysis and dissemination of small-area data. Int J Epidemiol 49(Supplement_1):i4–i14. [Full Text]
  • Jakob CEM, Kohlmayer F, Meurers T, Vehreschild JJ, Prasser F. 2020. Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19. Sci Data 7(1):435. [Full Text]
  • Kaewkungwal J, Adams P, Sattabongkot J, Lie RK, Wendler D. 2020. Issues and challenges associated with data-sharing in low- and middle-income countries: perspectives of researchers in Thailand. Am J Trop Med Hyg; doi:10.4269/ajtmh.19-0651 [Online 11 May 2020]. [Full Text]
  • Lacey JV, Chung NT, Hughes P, Benbow JL, Duffy C, Savage KE, Spielfogel ES, Wang SS, Martinez ME, Chandra S. 2020. Insights from adopting a data commons approach for large-scale observational cohort studies: the California Teachers Study. Cancer Epidemiol Biomarkers Prev; doi:10.1158/1055-9965.EPI-19-0842 [Online 12 February 2020]. [Abstract]
  • Lin D, Crabtree J, Dillo I, Downs RR, Edmunds R, Giaretta D, De Giust M, L'Hours H, Hugo W, Jenkyns R, Khodiyar V, Martone ME, Mokrane M, Navale V, Petters J, Sierman B, Sokolova DV, Stockhause M, Westbrook J. 2020. The TRUST Principles for Digital Repositories. Sci Data 7(144). [Full Text]
  • Lobe M, Matthies F, Staubert S, Meineke FA, Winter A. 2020. Problems in FAIRifying medical datasets. Stud Health Technol Inform 270:392-396. [Full Text]
  • Matheson GJ, Plaven-Sigray P, Tuisku J, Rinne J, Matuskey D, Cervenka S. 2020. Clinical brain PET research must embrace multi-centre collaboration and data sharing or risk its demise. Eur J Nucl Med Mol Imaging 47(2):502-504. [Full Text]
  • Merz KM Jr, Amaro R, Cournia Z, Rarey M, Soares T, Tropsha A, Wahab HA, Wang R. 2020. Editorial: method and data sharing and reproducibility of scientific results. J Chem Inf Model 60(12):5868-5869. [Full Text]
  • Mons B. 2020. Invest 5% of research funds in ensuring data are reusable. Nature 578(7796):491. [Full Text]
  • National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. [Full Text]
  • Paprica PA, Sutherland E, Smith A, Brudno M, Cartagena RG, Crichlow M, Courtney BK, Loken C, McGrail KM, Ryan A, Schull MJ, Thorogood A, Virtanen C, Yang K. 2020. Essential requirements for establishing and operating data trusts: practical guidance co-developed by representatives from fifteen Canadian organizations and initiatives. Int J Popul Data Sci 5(1):1353. [Full Text]
  • Perrier L, Blondal E, MacDonald H. 2020. The views, perspectives, and experiences of academic researchers with data sharing and reuse: a meta-synthesis. PLoS One 15(2):e0229182. [Full Text]
  • Phillips M, Molnar-Gabor F, Korbel JO, Thorogood A, Joly Y, Chalmers D, Townend D, Knoppers BM. 2020. Genomics: data sharing needs an international code of conduct. Nature 578(7793):31-33. [Full Text]
  • Promoting best practice in nucleotide sequence data sharing. 2020. Sci Data 7(1):152. [Full Text]
  • Rios RS, Zheng KI, Zheng MH. 2020. Data sharing during COVID-19 pandemic: what to take away. Expert Rev Gastroenterol Hepatol; doi:10.1080/17474124.2020.1815533 [Online 26 August 2020]. [Full Text]
  • Sinaci AA, Núñez-Benjumea FJ, Gencturk M, Jauer ML, Deserno T, Chronaki C, Cangioli G, Cavero-Barca C, Rodríguez-Pérez JM, Pérez-Pérez MM, Laleci Erturkmen GB, Hernández-Pérez T, Méndez-Rodríguez E, Parra-Calderón CL.  2020. From Raw Data to FAIR Data: The FAIRification Workflow for Health Research. Methods Inf Med 59(S 01):e21-e32. [Full Text]
  • Smith CD, Mennis J. 2020. Incorporating geographic information science and technology in response to the COVID-19 pandemic. Prev Chronic Dis 17:E58. [Full Text]
  • Solle D. 2020. Be FAIR to your data. Anal Bioanal Chem 412(17):3961-3965. [Full Text]
  • Tang L. 2020. FAIR your data. Nat Methods 17(2):127. [Full Text]
  • Tenopir C, Rice NM, Allard S, Baird L, Borycz J, Christian L, Grant B, Olendorf R, Sandusky RJ. 2020. Data sharing, management, use, and reuse: practices and perceptions of scientists worldwide. PLoS One 15(3):e0229003. [Full Text]
  • Thelwall M, Munafò M, Mas-Bleda A, Stuart E, Makita M, Weigert V, Keene C, Khan N, Drax K, Kousha K. 2020. Is useful research data usually shared? An investigation of genome-wide association study summary statistics. PLoS One 15(2):e0229578. [Full Text]
  • Udesky JO, Boronow KE, Brown P, Perovich LJ, Brody JG. 2020. Perceived risks, benefits, and interest in participating in environmental health studies that share personal exposure data: a U.S. survey of prospective participants. J Empir Res Hum Res Ethics; doi:10.1177/1556264620903595. [Online 15 February 2020]. [Abstract]
  • White T, Blok E, Calhoun VD. 2020. Data sharing and privacy issues in neuroimaging research: opportunities, obstacles, challenges, and monsters under the bed. Hum Brain Mapp; doi:10.1002/hbm.25120 [Online 4 July 2020]. [Full Text]

2019

  • Barba A, Dominguez S, Cobas C, Martinsen DP, Romain C, Rzepa HS, Seoane F. 2019. Workflows allowing creation of journal article supporting information and Findable, Accessible, Interoperable, and Reusable (FAIR)-enabled publication of spectroscopic data. ACS Omega 4(2):3280-3286. [Abstract]
  • Biomedical Data Translator Consortium. 2019. The biomedical data translator program: conception, culture, and community. Clin Transl Sci 12(2):91-94. [Full Text]
  • Carbon S, Champieux R, McMurry JA, Winfree L, Wyatt LR, Haendel MA. 2019. An analysis and metric of reusable data licensing practices for biomedical resources. PLoS One 14(3):e0213090. [Abstract]
  • Celi LA, Citi L, Ghassemi M, Pollard TJ. 2019. The PLOS ONE collection on machine learning in health and biomedicine: towards open code and open data. PLoS One 14(1):e0210232. [Abstract]
  • Christensen G, Dafoe A, Miguel E, Moore DA, Rose AK. 2019. A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLoS One 14(12):e0225883. [Abstract]
  • Clarke DJB, Wang L, Jones A, Wojciechowicz ML, Torre D, Jagodnik KM, Jenkins SL, McQuilton P, Flamholz Z, Silverstein MC, Schilder BM, Robasky K, Castillo C, Idaszak R, Ahalt SC, Williams J, Schurer S, Cooper DJ, de Miranda Azevedo R, Klenk JA, Haendel MA, Nedzel J, Avillach P, Shimoyama ME, Harris RM, Gamble M, Poten R, Charbonneau AL, Larkin J, Brown CT, Bonazzi VR, Dumontier MJ, Sansone SA, Ma'ayan A. 2019. FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Syst 9(5):417-421. [Abstract]
  • Fecho K, Ahalt SC, Arunachalam S, Champion J, Chute CG, Davis S, Gersing K, Glusman G, Hadlock J, Lee J, Pfaff E, Robinson M, Sid E, Ta C, Xu H, Zhu R, Zhu Q, Peden DB, Biomedical Data Translator Consortium. 2019. Sex, obesity, diabetes, and exposure to particulate matter among patients with severe asthma: scientific insights from a comparative analysis of open clinical data sources during a five-day hackathon. J Biomed Inform 100:103325. [Abstract]
  • Fothergill BT, Knight W, Stahl BC, Ulnicane I. 2019. Responsible data governance of neuroscience big data. Front Neuroinform 13:28. [Abstract]
  • Gaffney SG, Ad O, Smaga S, Schepartz A, Townsend JP. 2019. GEM-NET: lessons in multi-institution teamwork using collaboration software. ACS Cent Sci 5(7):1159-1169. [Abstract]
  • Jansen P, van den Berg L, van Overveld P, Boiten JW. 2019. Research data stewardship for healthcare professionals. In: Fundamentals of Clinical Data Science (Kubben P, Dumontier M, Dekker A, eds.). Cham, Switzerland: Springer. [Full Text]
  • Li R, Sim I. 2019. How clinical trial data sharing platforms can advance the study of biomarkers. J Law Med Ethics 47(3):369-373. [Abstract]
  • Li X, Fireman BH, Curtis JR, Arterburn DE, Fisher DP, Moyneur E, Gallagher M, Raebel MA, Nowell WB, Lagreid L, Toh S. 2019. Validity of privacy-protecting analytical methods that use only aggregate-level information to conduct multivariable-adjusted analysis in distributed data networks. Am J Epidemiol 188(4):709-723. [Abstract]
  • Madduri R, Chard K, D'Arcy M, Jung SC, Rodriguez A, Sulakhe D, Deutsch E, Funk C, Heavner B, Richards M, Shannon P, Glusman G, Price N, Kesselman C, Foster I. 2019. Reproducible big data science: a case study in continuous FAIRness. PLoS One 14(4):e0213013. [Abstract]
  • Oliveira JL, Trifan A, Bastiao Silva LA. 2019. EMIF Catalogue: a collaborative platform for sharing and reusing biomedical data. Int J Med Inform 126:35-45. [Abstract]
  • Perez-Riverol Y, Zorin A, Dass G, Vu MT, Xu P, Glont M, Vizcaino JA, Jarnuczak AF, Petryszak R, Ping P, Hermjakob H. 2019. Quantifying the impact of public omics data. Nat Commun 10(1):3512. [Abstract]
  • Polanin JR, Terzian M. 2019. A data-sharing agreement helps to increase researchers' willingness to share primary data: results from a randomized controlled trial. J Clin Epidemiol 106:60-69. [Abstract]
  • Popkin G. 2019. Data sharing and how it can benefit your scientific career. Nature 569(7756):445-447. [Abstract]
  • Psaty BM, Rich SS, Boerwinkle E. 2019. Innovation in genomic data sharing at the NIH. N Engl J Med 380(23):2192-2195. [Abstract]
  • Resnik DB, Morales M, Landrum R, Shi M, Minnier J, Vasilevsky NA, Champieux RE. 2019. Effect of impact factor and discipline on journal data sharing policies. Account Res 26(3):139-156. [Abstract]
  • Ruhamyankaka E, Brunk BP, Dorsey G, Harb OS, Helb DA, Judkins J, Kissinger JC, Lindsay B, Roos DS, San EJ, Stoeckert CJ, Zheng J, Tomko SS. 2019. ClinEpiDB: an open-access clinical epidemiology database resource encouraging online exploration of complex studies. Gates Open Res 3:1661. [Abstract]
  • Sansone SA, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, Thurston M. 2019.  FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol 2019;37(4):358-367. [Full Text]
  • Spoor S, Cheng CH, Sanderson LA, Condon B, Almsaeed A, Chen M, Bretaudeau A, Rasche H, Jung S, Main D, Bett K, Staton M, Wegrzyn JL, Feltus FA, Ficklin SP. 2019. Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases. Database (Oxford) 2019. [Abstract]
  • Staley J, Mazloom R, Lowe P, Newsum CT, Jaberi-Douraki M, Riviere J, Wyckoff GJ. 2019. Novel data sharing agreement to accelerate big data translational research projects in the one health sphere. Top Companion Anim Med 37:100367. [Abstract]
  • Vesteghem C, Brondum RF, Sonderkaer M, Sommer M, Schmitz A, Bodker JS, Dybkaer K, El-Galaly TC, Bogsted M. 2019. Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives. Brief Bioinform; doi:10.1093/bib/bbz044 [Online 29 June 2019]. [Abstract]
  • Villanueva AG, Cook-Deegan R, Koenig BA, Deverka PA, Versalovic E, McGuire AL, Majumder MA. 2019. Characterizing the biomedical data-sharing landscape. J Law Med Ethics 47(1):21-30. [Abstract]
  • Xu H, Zhang N. 2019. Privacy in health disparity research. Med Care 57(Suppl 6 Suppl 2):S172–S175. [Full Text]

2018

  • Boeckhout M, Zielhuis GA, Bredenoord AL. 2018. The FAIR guiding principles for data stewardship: fair enough? Eur J Hum Genet 26(7):931-936. [Abstract]
  • Brown AV, Campbell JD, Assefa T, Grant D, Nelson RT, Weeks NT, Cannon SB. 2018. Ten quick tips for sharing open genomic data. PLoS Comput Biol 14(12):e1006472. [Abstract]
  • Escribano N, Galicia D, Arino AH. 2018. The tragedy of the biodiversity data commons: a data impediment creeping nigher? Database (Oxford) 2018. [Abstract]
  • Gruning B, Chilton J, Koster J, Dale R, Soranzo N, van den Beek M, Goecks J, Backofen R, Nekrutenko A, Taylor J. 2018. Practical computational reproducibility in the life sciences. Cell Syst 6(6):631-635. [Abstract]
  • Holub P, Kohlmayer F, Prasser F, Mayrhofer MT, Schlunder I, Martin GM, Casati S, Koumakis L, Wutte A, Kozera L, Strapagiel D, Anton G, Zanetti G, Sezerman OU, Mendy M, Valik D, Lavitrano M, Dagher G, Zatloukal K, van Ommen GB, Litton JE. 2018. Enhancing reuse of data and biological material in medical research: from FAIR to FAIR-Health. Biopreserv Biobank 16(2):97-105. [Abstract]
  • Kitzes J, Turek D, Deniz F, eds. 2018. The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [Full Text]
  • National Academies of Sciences, Engineering, and Medicine. 2018. Open Science by Design: Realizing a Vision for 21st Century Research. Washington, DC: The National Academies Press. [Full Text]
  • Navale V, McAuliffe M. 2018. Long-term preservation of biomedical research data. F1000Res 7:1353. [Abstract]
  • Perkel JM. 2018. A toolkit for data transparency takes shape. Nature 560(7719):513-515. [Abstract]
  • Yaniv Z, Lowekamp BC, Johnson HJ, Beare R. 2018. SimpleITK image-analysis notebooks: a collaborative environment for education and reproducible research. J Digit Imaging 31(3):290-303. [Abstract]

2017

  • Ascoli GA, Maraver P, Nanda S, Polavaram S, Armananzas R. 2017. Win-win data sharing in neuroscience. Nat Methods 14(2):112-116. [Abstract]
  • Boland MR, Karczewski KJ, Tatonetti NP. 2017. Ten simple rules to enable multi-site collaborations through data sharing. PLoS Comput Biol 13(1):e1005278. [Abstract]
  • Fleming L, Tempini N, Gordon-Brown H, Nichols GL, Sarran C, Vineis P, Leonardi G, Golding B, Haines A, Kessel A, Murray V, Depledge M, Leonelli S. 2017. Big data in environment and human health. Oxford Research Encyclopedia of Environmental Science. [Full Text]
  • Hudson KL, Collins FS. 2017. The 21st Century Cures Act - a view from the NIH. N Engl J Med 376(2):111-113. [Abstract]
  • Jimenez RC, Kuzak M, Alhamdoosh M, Barker M, Batut B, Borg M, Capella-Gutierrez S, Chue Hong N, Cook M, Corpas M, Flannery M, Garcia L, Gelpi JL, Gladman S, Goble C, Gonzalez Ferreiro M, Gonzalez-Beltran A, Griffin PC, Gruning B, Hagberg J, Holub P, Hooft R, Ison J, Katz DS, Leskosek B, Lopez Gomez F, Oliveira LJ, Mellor D, Mosbergen R, Mulder N, Perez-Riverol Y, Pergl R, Pichler H, Pope B, Sanz F, Schneider MV, Stodden V, Suchecki R, Svobodova Varekova R, Talvik HA, Todorov I, Treloar A, Tyagi S, van Gompel M, Vaughan D, Via A, Wang X, Watson-Haigh NS, Crouch S. 2017. Four simple recommendations to encourage best practices in research software. F1000Res 6. [Abstract]
  • Majumder MA, Guerrini CJ, Bollinger JM, Cook-Deegan R, McGuire AL. 2017. Sharing data under the 21st Century Cures Act. Genet Med 19(12):1289-1294. [Abstract]
  • McIntosh LD, Juehne A, Vitale CRH, Liu X, Alcoser R, Lukas JC, Evanoff B. 2017. Repeat: a framework to assess empirical reproducibility in biomedical research. BMC Med Res Methodol 17(1):143. [Abstract]
  • Ohmann C, Banzi R, Canham S, Battaglia S, Matei M, Ariyo C, Becnel L, Bierer B, Bowers S, Clivio L, Dias M, Druml C, Faure H, Fenner M, Galvez J, Ghersi D, Gluud C, Groves T, Houston P, Karam G, Kalra D, Knowles RL, Krleza-Jeric K, Kubiak C, Kuchinke W, Kush R, Lukkarinen A, Marques PS, Newbigging A, O'Callaghan J, Ravaud P, Schlunder I, Shanahan D, Sitter H, Spalding D, Tudur-Smith C, van Reusel P, van Veen EB, Visser GR, Wilson J, Demotes-Mainard J. 2017. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open 7(12):e018647. [Abstract]
  • Olfson M, Wall MM, Blanco C. 2017. Incentivizing data sharing and collaboration in medical research–the S-Index. JAMA Psychiatry 74(1):5-6. [Abstract]
  • Thelwall M, Kousha K. 2017. Do journal data sharing mandates work? Life sciences evidence from Dryad. ASLIB J Inform Manag 69(1):36-45. [Abstract]

2016

  • McKiernan EC, Bourne PE, Brown CT, Buck S, Kenall A, Lin J, McDougall D, Nosek BA, Ram K, Soderberg CK, Spies JR, Thaney K, Updegrove A, Woo KH, Yarkoni T. 2016. How open science helps researchers succeed. Elife 5:e16800. [Abstract]
  • Piccolo SR, Frampton MB. 2016. Tools and techniques for computational reproducibility. Gigascience 5(1):30. [Abstract]
  • Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, Heroux MA, Ioannidis JP, Taufer M. 2016. Enhancing reproducibility for computational methods. Science 354(6317):1240-1241. [Abstract]
  • Warren E. 2016. Strengthening research through data sharing. N Engl J Med 375(5):401-403. [Abstract]
  • Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, 't Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018. [Abstract]

Data Repositories

  • Data Observation Network for Earth (DataONE)
    DataONE is a community-driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data. DataONE provides data usage and citation metrics for datasets.
  • DataCite
    Search the DataCite registry for datasets, software, images, and other research material.
  • DataMed
    DataMed is a prototype biomedical data search engine being developed for the NIH BD2K Data Discovery Index (DDI) by the bioCaddie project team. It allows users to search and find data across different repositories.
  • General Code Repositories
    Use general code repositories such as Bitbucket, GitHub, and Source Forge to share code. CRAN is an R package archive network. Code Ocean is a research collaboration platform that lets users share computational environments and code in a Web browser. The Jupyter Notebook is an open-source web application for creating and sharing scientific data and text. It allows the user to create and share documents that contain live code, equations, visualizations, and text.
  • NIH Data Sharing Repositories
    This table lists NIH-supported data repositories that make data accessible for reuse.
  • Omics Discovery Index (OmicsDI)
    OmicsDI provides dataset discovery across a heterogeneous, distributed group of transcriptomics, genomics, proteomics, and metabolomics data resources spanning 16 repositories in three continents and six organizations, including both open- and controlled-access data resources.
  • Registry of Research Data Repositories (re3data.org)
    re3data.org is a global registry of research data repositories. It provides detailed information on more than 2,000 repositories to help researchers find the right one for their data. re3data.org is a service of DataCite, a global non-profit organization that provides DOIs for research data.
  • SciCrunch
    SciCrunch is a data sharing and display platform designed to help communities create their own portals to provide access to research resources, data, literature, and tools. Hosted at the University of California, San Diego, SciCrunch is home to the Drug Design Data Resource, NIDDK Information Network (National Institute of Diabetes and Digestive and Kidney Diseases), the Neuroscience Information Network, and the Research Resource Identifiers (RRID) Portal which provides shared identifiers for citing research resources such as cell lines or antibodies in the literature. Users can search within and across these community portals.
  • United States Geological Survey (USGS) Data Repository Webpage
    This webpage provides useful information about preserving environmental data in data repositories.

Relevant Publications

2021

  • Altenhoff AM, Train CM, Gilbert KJ, Mediratta I, Mendes de Farias T, Moi D, Nevers Y, Radoykova HS, Rossier V, Warwick Vesztrocy A, Glover NM, Dessimoz C. 2021. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res 49(D1):D373-D379. [Full Text]
  • Bastian FB, Roux J, Niknejad A, Comte A, Fonseca Costa SS, de Farias TM, Moretti S, Parmentier G, de Laval VR, Rosikiewicz M, Wollbrett J, Echchiki A, Escoriza A, Gharib WH, Gonzales-Porta M, Jarosz Y, Laurenczy B, Moret P, Person E, Roelli P, Sanjeev K, Seppey M, Robinson-Rechavi M. 2021. The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals. Nucleic Acids Res 49(D1):D831-D847. [Full Text]
  • Blake JA, Baldarelli R, Kadin JA, Richardson JE, Smith CL, Bult CJ, Mouse Genome Database Group. 2021. Mouse genome database (MGD): knowledgebase for mouse-human comparative biology. Nucleic Acids Res 49(D1):D981-D987. [Full Text]
  • Dorne JLCM, Richardson J, Livaniou A, Carnesecchi E, Ceriani L, Baldin R, Kovarich S, Pavan M, Saouter E, Biganzoli F, Pasinato L, Zare Jeddi M, Robinson TP, Kass GEN, Liem AKD, Toropov AA, Toropova AP, Yang C, Tarkhov A, Georgiadis N, Di Nicola MR, Mostrag A, Verhagen H, Roncaglioni A, Benfenati E, Bassan A. 2021. EFSA's OpenFoodTox: an open source toxicological database on chemicals in food and feed and its future developments. Environ Int 146:106293. [Full Text]
  • Gilbertson PK, Forrester S, Andrews L, McCann K, Rogers L, Park C, Moye J. 2021. The National Children's Study Archive model: A 3-tier framework for dissemination of data and specimens for general use and secondary analysis. Front Public Health 9:526286. [Full Text]
  • Kasmanas JC, Bartholomaus A, Correa FB, Tal T, Jehmlich N, Herberth G, von Bergen M, Stadler PF, Carvalho ACPLF, Nunes da Rocha U. 2021. HumanMetagenomeDB: a public repository of curated and standardized metadata for human metagenomes. Nucleic Acids Res 49(D1):D743-D750. [Full Text]
  • Lim N, Tesar S, Belmadani M, Poirier-Morency G, Mancarci BO, Sicherman J, Jacobson M, Leong J, Tan P, Pavlidis P. 2021. Curation of over 10,000 transcriptomic studies to enable data reuse. Database (Oxford) 2021:baab006. [Full Text]
  • Martens M, Ammar A, Riutta A, Waagmeester A, Slenter DN, Hanspers K, A Miller R, Digles D, Lopes EN, Ehrhart F, Dupuis LJ, Winckers LA, Coort SL, Willighagen EL, Evelo CT, Pico AR, Kutmon M. 2021. WikiPathways: connecting communities. Nucleic Acids Res 49(D1):D613-D621. [Full Text]
  • Moretti S, Tran VDT, Mehl F, Ibberson M, Pagni M. 2021. MetaNetX/MNXref: unified namespace for metabolites and biochemical reactions in the context of metabolic models. Nucleic Acids Res 49(D1):D570-D574. [Full Text]
  • Ortolani C, D'Atri M, Zamai L, Canonico B, Del Zotto G, Papa S. 2021. ESCCABase project: a repository in progress. Cytometry A; doi:10.1002/cyto.a.24355 [Online 7 May 2021]. [Full Text]
  • Watanabe Y, Yoshizawa AC, Ishihama Y, Okuda S. 2021. The jPOST repository as a public data repository for shotgun proteomics. Methods Mol Biol 2259:309-322. [Abstract]
  • Zhang Z, Hernandez K, Savage J, Li S, Miller D, Agrawal S, Ortuno F, Staudt LM, Heath A, Grossman RL. 2021. Uniform genomic data analysis in the NCI Genomic Data Commons. Nat Commun 12(1):1226. [Full Text]

2020

  • Bogue MA, Philip VM, Walton DO, Grubb SC, Dunn MH, Kolishovski G, Emerson J, Mukherjee G, Stearns T, He H, Sinha V, Kadakkuzha B, Kunde-Ramamoorthy G, Chesler EJ. 2020. Mouse phenome database: a data repository and analysis suite for curated primary mouse phenotype data. Nucleic Acids Res 48(D1):D716-D723. [Full Text]

2019

  • Amid C, Pakseresht N, Silvester N, Jayathilaka S, Lund O, Dynovski LD, Pataki BA, Visontai D, Xavier BB, Alako BTF, Belka A, Cisneros JLB, Cotten M, Haringhuizen GB, Harrison PW, Hoper D, Holt S, Hundahl C, Hussein A, Kaas RS, Liu X, Leinonen R, Malhotra-Kumar S, Nieuwenhuijse DF, Rahman N, Dos S Ribeiro C, Skiby JE, Schmitz D, Steger J, Szalai-Gindl JM, Thomsen MCF, Caccio SM, Csabai I, Kroneman A, Koopmans M, Aarestrup F, Cochrane G. 2019. The COMPARE data hubs Database (Oxford). 2019(2019):baz136. [Full Text]
  • Banzi R, Canham S, Kuchinke W, Krleza-Jeric K, Demotes-Mainard J, Ohmann C. 2019. Evaluation of repositories for sharing individual-participant data from clinical studies. Trials 20(1):169. [Abstract]
  • Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, Christie C, Dalenberg K, Duarte JM, Dutta S, Feng Z, Ghosh S, Goodsell DS, Green RK, Guranovic V, Guzenko D, Hudson BP, Kalro T, Liang Y, Lowe R, Namkoong H, Peisach E, Periskova I, Prlic A, Randle C, Rose A, Rose P, Sala R, Sekharan M, Shao C, Tan L, Tao YP, Valasatava Y, Voigt M, Westbrook J, Woo J, Yang H, Young J, Zhuravleva M, Zardecki C. 2019. RCSB protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 47(D1):D464-D474. [Full Text]
  • Grossman RL. 2019. Data lakes, clouds, and commons: a review of platforms for analyzing and sharing genomic data. Trends Genet 35(3):223-234. [Abstract]
  • Laulederkind SJF, Hayman GT, Wang SJ, Hoffman MJ, Smith JR, Bolton ER, De Pons J, Tutaj MA, Tutaj M, Thota J, Dwinell MR, Shimoyama M. 2019. Rat genome databases, repositories, and tools. Methods Mol Biol 2018:71-96. [Abstract]
  • Ma J, Chen T, Wu S, Yang C, Bai M, Shu K, Li K, Zhang G, Jin Z, He F, Hermjakob H, Zhu Y. 2019. iProX: an integrated proteome resource. Nucleic Acids Res 47(D1):D1211-D1217. [Full Text]
  • UniProt Consortium. 2019. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506-D515. [Full Text]
  • Saldanha IJ, Smith BT, Ntzani E, Jap J, Balk EM, Lau J. 2019. The Systematic Review Data Repository (SRDR): descriptive characteristics of publicly available data and opportunities for research. Syst Rev 8(1):334. [Abstract]

2018

  • Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. 2018. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 25(3):300-308. [Abstract]
  • Kleywegt GJ, Velankar S, Patwardhan A. 2018. Structural biology data archiving - where we are and what lies ahead. FEBS Lett 592(12):2153-2167. [Abstract]

2017

  • Ohno-Machado L, Sansone SA, Alter G, Fore I, Grethe J, Xu H, Gonzalez-Beltran A, Rocca-Serra P, Gururaj AE, Bell E, Soysal E, Zong N, Kim HE. 2017. Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49(6):816-819. [Abstract]
  • Perez-Riverol Y, Bai M, da Veiga Leprevost F, Squizzato S, Park YM, Haug K, Carroll AJ, Spalding D, Paschall J, Wang M, Del-Toro N, Ternent T, Zhang P, Buso N, Bandeira N, Deutsch EW, Campbell DS, Beavis RC, Salek RM, Sarkans U, Petryszak R, Keays M, Fahy E, Sud M, Subramaniam S, Barbera A, Jimenez RC, Nesvizhskii AI, Sansone SA, Steinbeck C, Lopez R, Vizcaino JA, Ping P, Hermjakob H. 2017. Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol 35(5):406-409. [Abstract]

Data Citations

  • Australian National Data Service (ANDS) Guide to Data Citation
    The ANDS Guide to Data Citation provides an overview on how to cite data, data citation styles and formats, using persistent identifiers (e.g., DOIs), and tracking data citations.
  • Ball A, Duke M. 2015. How to Cite Datasets and Link to Publications. Edinburgh: Digital Curation Centre.
  • DataCite – Cite Your Data
    DataCite is a leading global non-profit organization that provides persistent identifiers (DOIs) for research data. Their goal is to help the research community locate, identify, and cite research data with confidence. This page contains best practices for citing data. Properly citing data gives scholarly credit to data producers and facilitates discovery and reuse of the dataset.
  • Joint Declaration of Data Citation Principles – Final
    First released in November 2013 and finalized in February 2014, the declaration describes eight principles that emphasize the importance of data as evidence, the need to give credit to data contributors, the idea that cited data requires unique and persistent identifiers, and the belief that data citation should allow for human and machine access to the data and support verification and interoperability.

Relevant Publications

2020

  • Buneman P, Christie G, Davies JA, Dimitrellou R, Harding SD, Pawson AJ, Sharman JL, Wu Y. 2020. Why data citation isn't working, and what to do about it. Database (Oxford) 2020(2020):baaa022. [Full Text]

2019

  • Fenner M, Crosas M, Grethe JS, Kennedy D, Hermjakob H, Rocca-Serra P, Durand G, Berjon R, Karcher S, Martone M, Clark T. 2019. A data citation roadmap for scholarly data repositories. Sci Data 6(1):28. [Abstract]

2018

  • Cousijn H, Kenall A, Ganley E, Harrison M, Kernohan D, Lemberger T, Murphy F, Polischuk P, Taylor S, Martone M, Clark T. 2018. A data citation roadmap for scientific publishers. Sci Data 5:180259. [Abstract]
  • Lemberger T. 2018. Data citation: what, when, why? Mol Syst Biol 14(12):e8783. [Abstract]

2016

  • Honor LB, Haselgrove C, Frazier JA, Kennedy DN. 2016. Data citation in neuroimaging: proposed best practices for data identification and attribution. Front Neuroinform 10:34. [Abstract]

Research Data Management

Relevant Publications

2021

  • Bai J, Bandla C, Guo J, Vera Alvarez R, Bai M, Vizcaíno JA, Moreno P, Grüning B, Sallou O, Perez-Riverol Y. 2021. BioContainers Registry: Searching Bioinformatics and Proteomics Tools, Packages, and Containers. J Proteome Res. 20(4):2056-2061. [Abstract]
  • Coarfa C, Grimm SL, Rajapakshe K, Perera D, Lu HY, Wang X, Christensen KR, Mo Q, Edwards DP, Huang S. 2021. Reverse-phase protein array: technology, application, data processing, and integration. J Biomol Tech; doi:10.7171/jbt.2021-3202-001 [Online 15 January 2021]. [Full Text]

2020

  • Perez-Riverol Y, Moreno P. Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines. Proteomics. 2020 May;20(9):e1900147. [Abstract]

2019

  • Miksa T, Simms S, Mietchen D, Jones S. 2019. Ten principles for machine-actionable data management plans. PLoS Comput Biol 15(3):e1006750. [Abstract]

2018

  • Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC; BioContainers Community, Perez-Riverol Y. 2018. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 7:ISCB Comm J-742. [Full Text]
  • Schiermeier Q. 2018. Data management made simple. Nature 555(7696):403-405. [Abstract]

2015

  • Michener WK. 2015. Ten simple rules for creating a good data management plan. PLoS Comput Biol 11(10):e1004525. [Abstract]

Metadata Standards/Ontologies/Data Integration

  • ANDS Guide to Metadata
    This comprehensive guide provides a working-level view of the needs, issues, and processes around metadata collection and creation for research data.
  • Disciplinary Metadata
    The Digital Curation Center allows one to search for metadata standards, extensions, tools, and use cases by discipline (biology, earth science, general research data, physical science, and social science and humanities).
  • FAIRsharing Standards Database
    The standards in FAIRsharing are manually curated from a variety of sources, including BioPortal, MIBBI, and the Equator Network.
  • NCBO BioPortal
    BioPortal is a comprehensive repository of biomedical ontologies developed by the National Center for Biomedical Ontology, an international consortium providing ontological resources for the biomedical research community. BioPortal allows the user to browse or search ontologies, get ontology recommendations, explore mappings between ontology terms, and annotate textual biomedical data with ontology terms. The NIEHS Children’s Health Exposure Analysis Resource Ontology can be found in BioPortal.
  • NIH Common Data Elements (CDE) Repository
    The NIH CDE Repository provides access to structured human- and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and other purposes.
  • Open Biological and Biomedical Ontology (OBO) Foundry
    The OBO Foundry is a collective of ontology developers with the mission to develop a family of interoperable science-based ontologies for shared use across different biological and medical domains. They have published a set of normative principles for OBO Foundry ontologies.
  • PhenXToolkit (consensus measures for Phenotypes and eXposures)
    The PhenX Toolkit is a Web-based catalog of recommended, standard measures for phenotypes and exposures for use in biomedical research. Using protocols from the PhenX Toolkit allows investigators who are studying different diseases and conditions to collect data using the same methodologies - thus facilitating cross-study analysis.

Relevant Publications

2021

  • Abrams MB, Bjaalie JG, Das S, Egan GF, Ghosh SS, Goscinski WJ, Grethe JS, Kotaleski JH, Ho ETW, Kennedy DN, Lanyon LJ, Leergaard TB, Mayberg HS, Milanesi L, Moucek R, Poline JB, Roy PK, Strother SC, Tang TB, Tiesinga P, Wachtler T, Wojcik DK, Martone ME. 2021. A standards organization for open and FAIR neuroscience: the international neuroinformatics coordinating facility. Neuroinformatics; doi:10.1007/s12021-020-09509-0 [Online 27 January 2021]. [Abstract]
  • Beaulieu-Jones B, Darabos C, Kim D, Verma A, Kobren SN. 2021. Innovative methodological approaches for data integration to derive patterns across diverse, large-scale biomedical datasets. Pac Symp Biocomput 26:256-260. [Full Text]
  • Blask K, Gerhards L, Jalynskij M. PsyCuraDat: Designing a User-Oriented Curation Standard for Behavioral Psychological Research Data. Front Psychol. 2021 Jan 12;11:579397. [Full Text]
  • Boughton AP, Welch RP, Flickinger M, VandeHaar P, Taliun D, Abecasis GR, Boehnke M. 2021. LocusZoom.js: Interactive and embeddable visualization of genetic association study results. Bioinformatics; doi:10.1093/bioinformatics/btab186 [Online 17 March 2021]. [Full Text]
  • Cantini L, Zakeri P, Hernandez C, Naldi A, Thieffry D, Remy E, Baudot A. 2021. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 12(1):124. [Full Text]
  • Chan L, Vasilevsky N, Thessen A, McMurry J, Haendel M. The landscape of nutri-informatics: a review of current resources and challenges for integrative nutrition research. Database (Oxford). 2021 Jan 25;2021:baab003. [Full Text]
  • Chiu W, Schmid MF, Pintilie G, Lawson CL. 2021. Evolution of standardization and dissemination of cryo-EM structures and data jointly by the community, PDB and EMDB. J Biol Chem; doi:10.1016/j.jbc.2021.100560 [Online 17 March 2021]. [Full Text]
  • Dimitrova M, Meyer R, Buttigieg PL, Georgiev T, Zhelezov G, Demirov S, Smith V, Penev L. 2021. A streamlined workflow for conversion, peer review, and publication of genomics metadata as omics data papers. Gigascience 10(5):giab034. [Full Text]
  • Halford JJ, Clunie DA, Brinkmann BH, Krefting D, Remi J, Rosenow F, Husain A, Furbass F, Andrew Ehrenberg J, Winkler S. 2021. Standardization of neurophysiology signal data into the DICOM® standard. Clin Neurophysiol; doi:10.1016/j.clinph.2021.01.019 [Online 20 February 2021]. [Abstract]
  • Hedin F, Konstantinou M, Cosma A. 2021. Data integration and visualization techniques for post-cytometric analysis of complex datasets. Cytometry A; doi:10.1002/cyto.a.24359 [Online 6 May 2021]. [Abstract]
  • Hong S, Liow CH, Yuk JM, Byon HR, Yang Y, Cho E, Yeom J, Park G, Kang H, Kim S, Shim Y, Na M, Jeong C, Hwang G, Kim H, Kim H, Eom S, Cho S, Jun H, Lee Y, Baucour A, Bang K, Kim M, Yun S, Ryu J, Han Y, Jetybayeva A, Choi PP, Agar JC, Kalinin SV, Voorhees PW, Littlewood P, Lee HM. Reducing Time to Discovery: Materials and Molecular Modeling, Imaging, Informatics, and Integration. ACS Nano. 2021 Feb 12. [Abstract]
  • Huguet J, Falcon C, Fuste D, Girona S, Vicente D, Molinuevo JL, Gispert JD, Operto G, ALFA Study. 2021. Management and quality control of large neuroimaging datasets: developments from the Barcelonaβeta Brain Research Center. Front Neurosci 15:633438. [Full Text]
  • Ison J, Ienasescu H, Rydza E, Chmura P, Rapacki K, Gaignard A, Schwammle V, van Helden J, Kalas M, Menager H. 2021. biotoolsSchema: a formalized schema for bioinformatics software description. Gigascience 10(1):giaa157. [Full Text]
  • Kamdar MR, Musen MA. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data. 2021 Jan 21;8(1):24. [Full Text]
  • Liang X, Akers K, Keenum I, Wind L, Gupta S, Chen C, Aldaihani R, Pruden A, Zhang L, Knowlton KF, Xia K, Heath LS. 2021. AgroSeek: a system for computational analysis of environmental metagenomic data and associated metadata. BMC Bioinformatics 22(1):117. [Full Text]
  • Loffler F, Wesp V, Konig-Ries B, Klan F. 2021. Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs? PLoS One 16(3):e0246099. [Full Text]
  • Lung PY, Zhong D, Pang X, Li Y, Zhang J. Maximizing the reusability of gene expression data by predicting missing metadata. PLoS Comput Biol. 2020 Nov 6;16(11):e1007450. [Full Text]
  • Marcon Y, Bishop T, Avraam D, Escriba-Montagut X, Ryser-Welch P, Wheater S, Burton P, Gonzalez JR. 2021. Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD. PLoS Comput Biol 17(3):e1008880. [Full Text]
  • Palafox MF, Desai HS, Arboleda VA, Backus KM. 2021. From chemoproteomic-detected amino acids to genomic coordinates: insights into precise multi-omic data integration. Mol Syst Biol 17(2):e9840. [Full Text]
  • Palmer RHC, Johnson EC, Won H, Polimanti R, Kapoor M, Chitre A, Bogue MA, Benca-Bachman CE, Parker CC, Ursu O, Verma A, Reynolds T, Ernst J, Bray M, Kwon SB, Lai D, Quach BC, Gaddis NC, Saba L, Chen H, Hawrylycz M, Zhang S, Zhou Y, Mahaffey S, Fischer C, Sanchez-Roige S, Bandrowski A, Qing L, Shen L, Philip V, Gelernter J, Bierut LJ, Hancock DB, Edenberg HJ, Johnson EO, Nestler EJ, Barr PB, Prins P, Smith DJ, Akbarian S, Thorgeirsson T, Walton D, Baker E, Jacobson D, Palmer AA, Miles M, Chesler EJ, Emerson J, Agrawal A, Martone M, Williams RW. 2021. Integration of evidence across human and model organism studies: a meeting report. Genes Brain Behav; doi:10.1111/gbb.12738 [Online 23 April 2021]. [Full Text]
  • Planell N, Lagani V, Sebastian-Leon P, van der Kloet F, Ewing E, Karathanasis N, Urdangarin A, Arozarena I, Jagodic M, Tsamardinos I, Tarazona S, Conesa A, Tegner J, Gomez-Cabrero D. 2012. STATegra: Multi-omics data integration - a conceptual scheme with a bioinformatics pipeline. Front Genet 12:620453. [Full Text]
  • Race AM, Sutton D, Hamm G, Maglennon G, Morton JP, Strittmatter N, Campbell A, Sansom OJ, Wang Y, Barry ST, Takáts Z, Goodwin RJA, Bunch J. Deep Learning-Based Annotation Transfer between Molecular Imaging Modalities: An Automated Workflow for Multimodal Data Integration. Anal Chem. 2021 Feb 16;93(6):3061-3071. [Abstract]
  • Reich NG, Cornell M, Ray EL, House K, Le K. 2021. The Zoltar forecast archive, a tool to standardize and store interdisciplinary prediction research. Sci Data 8(1):59. [Full Text]
  • Savoska S, Fdez-Arroyabe P, Cifra M, Kourtidis K, Rozanov E, Nicoll K, Dragovic S, Mir LM. 2021. Toward the creation of an ontology for the coupling of atmospheric electricity with biological systems. Int J Biometeorol 65(1):31-44. [Abstract]
  • Stilp AM, Emery LS, Broome JG, Buth EJ, Khan AT, Laurie CA, Wang FF, Wong Q, Chen D, D'Augustine CM, Heard-Costa NL, Hohensee CR, Johnson WC, Juarez LD, Liu J, Mutalik KM, Raffield LM, Wiggins KL, de Vries PS, Kelly TN, Kooperberg C, Natarajan P, Peloso GM, Peyser PA, Reiner AP, Arnett DK, Aslibekyan S, Barnes KC, Bielak LF, Bis JC, Cade BE, Chen MH, Correa A, Cupples LA, de Andrade M, Ellinor PT, Fornage M, Franceschini N, Gan W, Ganesh SK, Graffelman J, Grove ML, Guo X, Hawley NL, Hsu WL, Jackson RD, Jaquish CE, Johnson AD, Kardia SLR, Kelly S, Lee J, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Morrison AC, North KE, Nouraie SM, Oelsner EC, Pankratz N, Rich SS, Rotter JI, Smith JA, Taylor KD, Vasan RS, Weeks DE, Weiss ST, Wilson CG, Yanek LR, Psaty BM, Heckbert SR, Laurie CC. 2021. A system for phenotype harmonization in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Am J Epidemiol; doi:10.1093/aje/kwab115 [Online 16 April 2021]. [Full Text]
  • Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. BioHackathon 2015: Semantics of data for life sciences and reproducible research. F1000Res. 2020 Feb 24;9:136. [Full Text]
  • Votava JA, Parks BW. 2021. Cross-species data integration to prioritize causal genes in lipid metabolism. Curr Opin Lipidol 32(2):141-146. [Abstract]
  • Wen Y, Song X, Yan B, Yang X, Wu L, Leng D, He S, Bo X. 2021. Multi-dimensional data integration algorithm based on random walk with restart. BMC Bioinformatics 22(1):97. [Full Text]
  • Zanfardino M, Castaldo R, Pane K, Affinito O, Aiello M, Salvatore M, Franzese M. 2021. MuSA: a graphical user interface for multi-OMICs data integration in radiogenomic studies. Sci Rep 11(1):1550. [Full Text]

2020

  • Alter G, Gonzalez-Beltran A, Ohno-Machado L, Rocca-Serra P. 2020. The Data Tags Suite (DATS) model for discovering data access and use requirements. Gigascience 9(2). [Abstract]
  • Bernasconi A, Canakoglu A, Masseroli M, Ceri S. 2020. The road towards data integration in human genomics: players, steps and interactions. Brief Bioinform; doi:10.1093/bib/bbaa080 [Online 4 June 2020]. [Abstract]
  • Bernstein MN, Gladstein A, Latt KZ, Clough E, Busby B, Dillman A. 2020. Jupyter notebook-based tools for building structured datasets from the Sequence Read Archive. F1000Res 9:376. [Full Text]
  • Canzler S, Schor J, Busch W, Schubert K, Rolle-Kampczyk UE, Seitz H, Kamp H, von Bergen M, Buesen R, Hackermuller J. 2020. Prospects and challenges of multi-omics data integration in toxicology. Arch Toxicol; doi:10.1007/s00204-020-02656-y [Online 8 February 2020]. [Abstract]
  • Christley S, Aguiar A, Blanck G, Breden F, Bukhari SAC, Busse CE, Jaglale J, Harikrishnan SL, Laserson U, Peters B, Rocha A, Schramm CA, Taylor S, Vander Heiden JA, Zimonja B, Watson CT, Corrie B, Cowell LG. 2020. The ADC API: a web API for the programmatic query of the AIRR Data Commons. Front Big Data 3:22. [Full Text]
  • Elghafari A, Finkelstein J. 2020. Introducing an ontology-driven pipeline for the identification of common data elements. Stud Health Technol Inform 272:379-382. [Full Text]
  • Graw S, Chappell K, Washam CL, Gies A, Bird J, Robeson MS 2nd, Byrum SD. 2020. Multi-omics data integration considerations and study design for biological systems and disease. Mol Omics; doi: 10.1039/d0mo00041h [Online 21 December 2020]. [Full Text]
  • Hollmann S, Kremer A, Baebler S, Trefois C, Gruden K, Rudnicki WR, Tong W, Gruca A, Bongcam-Rudloff E, Evelo CT, Nechyporenko A, Frohme M, Safranek D, Regierer B, D'Elia D. 2020. The need for standardisation in life science research - an approach to excellence and trust. F1000Res 9:1398. [Full Text]
  • Konopka T, Smedley D. 2020. Incremental data integration for tracking genotype-disease associations. PLoS Comput Biol 16(1):e1007586. [Abstract]
  • Li J, Yin Y, Zhang M, Cui J, Zhang Z, Zhang Z, Sun D. 2020. GsmPlot: a web server to visualize epigenome data in NCBI. BMC Bioinformatics 21(1):55. [Abstract]
  • Meyer DE, Bailin SC, Vallero D, Egeghy PP, Liu SV, Cohen Hubal EA. 2020. Enhancing life cycle chemical exposure assessment through ontology modeling. Sci Total Environ 712:136263. [Abstract]
  • Odenkirk MT, Zin PPK, Ash JR, Reif DM, Fourches D, Baker ES. 2020. Structural-based connectivity and omic phenotype evaluations (SCOPE): a cheminformatics toolbox for investigating lipidomic changes in complex systems. Analyst 145(22):7197-7209. [Full Text]
  • Reid RW, Ferrier JW, Jay JJ. 2020. Automated gene data integration with Databio. BMC Res Notes 13(1):195. [Full Text]
  • Subramanian I, Verma S, Kumar S, Jere A, Anamika K. 2020. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14:1177932219899051. [Full Text]
  • Thessen AE, Grondin CJ, Kulkarni RD, Brander S, Truong L, Vasilevsky NA, Callahan TJ, Chan LE, Westra B, Willis M, Rothenberg SE, Jarabek AM, Burgoon L, Korrick SA, Haendel MA. 2020. Community approaches for integrating environmental exposures into human models of disease. Environ Health Perspect 128(12):125002. [Full Text]
  • Waltemath D, Golebiewski M, Blinov ML, Gleeson P, Hermjakob H, Hucka M, Inau ET, Keating SM, Konig M, Krebs O, Malik-Sheriff RS, Nickerson D, Oberortner E, Sauro HM, Schreiber F, Smith L, Stefan MI, Wittig U, Myers CJ. 2020. The first 10 years of the international coordination network for standards in systems and synthetic biology (COMBINE). J Integr Bioinform 17(2-3):20200005. [Full Text]

2019

  • Brown J, Phillips AR, Lewis DA, Mans MA, Chang Y, Tanguay RL, Peterson ES, Waters KM, Tilton SC. 2019. Bioinformatics Resource Manager: a systems biology web tool for microRNA and omics data integration. BMC Bioinformatics 20(1):255. [Abstract]
  • Bucher E, Claunch CJ, Hee D, Smith RL, Devlin K, Thompson W, Korkola JE, Heiser LM. 2019. Annot: a Django-based sample, reagent, and experiment metadata tracking system. BMC Bioinformatics 20(1):542. [Abstract]
  • Buendia P, Bradley RM, Taylor TJ, Schymanski EL, Patti GJ, Kabuka MR. 2019. Ontology-based metabolomics data integration with quality control. Bioanalysis 11(12):1139-1155. [Abstract]
  • Cooper DJ, Schurer S. 2019. Improving the utility of the Tox21 dataset by deep metadata annotations and constructing reusable benchmarked chemical reference signatures. Molecules 24(8):1604. [Abstract]
  • Dorea FC, Vial F, Hammar K, Lindberg A, Lambrix P, Blomqvist E, Revie CW. 2019. Drivers for the development of an Animal Health Surveillance Ontology (AHSO). Prev Vet Med 166:39-48. [Abstract]
  • Falster DS, FitzJohn RG, Pennell MW, Cornwell WK. 2019. Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R. Gigascience 8(5). [Abstract]
  • Fillinger S, de la Garza L, Peltzer A, Kohlbacher O, Nahnsen S. 2019. Challenges of big data integration in the life sciences. Anal Bioanal Chem 411(26):6791-6800. [Abstract]
  • Kourou KD, Pezoulas VC, Georga EI, Exarchos TP, Tsanakas P, Tsiknakis M, Varvarigou T, De Vita S, Tzioufas A, Fotiadis DI. 2019. Cohort harmonization and integrative analysis from a biomedical engineering perspective. IEEE Rev Biomed Eng 12:303-318. [Abstract]
  • Macklin P. 2019. Key challenges facing data-driven multicellular systems biology. Gigascience 8(10). [Abstract]
  • Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. 2019. Machine learning and integrative analysis of biomedical big data. Genes (Basel) 10(2):E87. [Abstract]
  • Pala D, Pagan J, Parimbelli E, Rocca MT, Bellazzi R, Casella V. 2019. Spatial enablement to support environmental, demographic, socioeconomics, and health data integration and analysis for big cities: a case study with asthma hospitalizations in New York City. Front Med 6:84. [Abstract]
  • Peng C, Goswami P. 2019. Meaningful integration of data from heterogeneous health services and home environment based on ontology. Sensors (Basel) 19(8):E1747. [Abstract]
  • Schymanski EL, Baker NC, Williams AJ, Singh RR, Trezzi JP, Wilmes P, Kolber PL, Kruger R, Paczia N, Linster CL, Balling R. 2019. Connecting environmental exposure and neurodegeneration using cheminformatics and high resolution mass spectrometry: potential and challenges. Environ Sci Process Impacts 21(9):1426-1445. [Abstract]
  • Siegele DA, LaBonte SA, Wu PI, Chibucos MC, Nandendla S, Giglio MG, Hu JC. 2019. Phenotype annotation with the Ontology of Microbial Phenotypes (OMP). J Biomed Semantics 10(1):13. [Abstract]
  • Sima AC, Stockinger K, de Farias TM, Gil M. 2019. Semantic integration and enrichment of heterogeneous biological databases. In: Evolutionary Genomics (Anisimova M, ed.). New York, NY: Humana. [Full Text]
  • Tang YA, Pichler K, Fullgrabe A, Lomax J, Malone J, Munoz-Torres MC, Vasant DV, Williams E, Haendel M. 2019. Ten quick tips for biocuration. PLoS Comput Biol 15(5):e1006906. [Abstract]
  • T'Joen V, Vaneeckhaute L, Priem S, Van Woensel S, Bekaert S, Berneel E, Van Der Straeten C. 2019. Rationalized development of a campus-wide cell line dataset for implementation in the Biobank LIMS system at Bioresource Center Ghent. Front Med (Lausanne) 6:137. [Abstract]
  • Wang RL, Edwards S, Ives C. 2019. Ontology-based semantic mapping of chemical toxicities. Toxicology 412:89-100. [Abstract]

2018

  • Abburu S. 2018. Ontology driven cross-linked domain data integration and spatial semantic multi criteria query system for geospatial public health. Int J Semant Web Inf Syst 14(3):1-30. [Abstract]
  • Baker N, Boobis A, Burgoon L, Carney E, Currie R, Fritsche E, Knudsen T, Laffont M, Piersma AH, Poole A, Schneider S, Daston G. 2018. Building a developmental toxicity ontology. Birth Defects Res 110(6):502-518. [Abstract] [ECETOC Open Access Report]
  • Cooper L, Meier A, Laporte MA, Elser JL, Mungall C, Sinn BT, Cavaliere D, Carbon S, Dunn NA, Smith B, Qu B, Preece J, Zhang E, Todorovic S, Gkoutos G, Doonan JH, Stevenson DW, Arnaud E, Jaiswal P. 2018. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res 46(D1):D1168-D1180. [Abstract]
  • Fairchild G, Tasseff B, Khalsa H, Generous N, Daughton AR, Velappan N, Priedhorsky R, Deshpande A. 2018. Epidemiological data challenges: planning for a more robust future through data standards. Front Public Health 6:336. [Abstract]
  • Haynes D, Jokela A, Manson S. 2018. IPUMS-Terra: integrated big heterogeneous spatio-temporal data analysis system. J Geogr Syst 20(4):343-361. [Abstract]
  • He Y, Xiang Z, Zheng J, Lin Y, Overton JA, Ong E. 2018. The eXtensible Ontology Development (XOD) principles and tool implementation to support ontology interoperability. J Biomed Semantics 9(1):3. [Abstract]
  • Huser V, Amos L. 2018. Analyzing real-world use of research common data elements. AMIA Annu Symp Proc 2018:602-608. [Abstract]
  • National Academies of Sciences, Engineering, and Medicine. 2018. Informing Environmental Health Decisions Through Data Integration: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. [Full Text]

2017

  • Alshahrani M, Khan MA, Maddouri O, Kinjo AR, Queralt-Rosinach N, Hoehndorf R. 2017. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 33(17):2723-2730. [Abstract]
  • Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. 2017. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res 45(D1):D972-D978 [Abstract]
  • Malinowski AK, Ananth CV, Catalano P, Hines EP, Kirby RS, Klebanoff MA, Mulvihill JJ, Simhan H, Hamilton CM, Hendershot TP, Phillips MJ, Kilpatrick LA, Maiese DR, Ramos EM, Wright RJ, Dolan SM; PhenX Pregnancy Working Group. 2017. Research standardization tools: pregnancy measures in the PhenX Toolkit. Am J Obstet Gynecol 217(3):249-262. [Abstract]

2016

  • Mattingly CJ, Boyles R, Lawler CP, Haugen AC, Dearry A, Haendel M. 2016. Laying a community-based foundation for data-driven semantic standards in environmental health sciences. Environ Health Perspect 124(8):1136-1140. [Abstract]
  • Rocca-Serra P, Salek RM, Arita M, Correa E, Dayalan S, Gonzalez-Beltran A, Ebbels T, Goodacre R, Hastings J, Haug K, Koulman A, Nikolski M, Oresic M, Sansone SA, Schober D, Smith J, Steinbeck C, Viant MR, Neumann S. 2016. Data standards can boost metabolomics research, and if there is a will, there is a way. Metabolomics 12:14. [Abstract]

Data Science Training

  • Big Data to Knowledge (BD2K)
    The BD2K Centers produced training and educational resources, including workshops, courses, webinars, lecture series, summer internships, and training programs.
  • ERUDITE (Educational Resource Discovery Index)
    Use ERUDITE to find educational resources (e.g., free modules, MOOCs, curricula, and webinars) and in-person or minimal-cost training opportunities (e.g., short courses) for data science.
  • Edison Data Science Framework
    The Edison Data Science Framework is a collection of documents that define the data science profession and competencies.
  • FOSTER
    The FOSTER portal is an e-learning platform providing training resources for those who need to know more about Open Science, or need to develop strategies and skills for implementing Open Science practices in their daily workflows. Available courses on Open Science include managing and sharing research data, best practices in open research, Open Science Software and workflows, data protection and ethics, and open licensing.
  • The Carpentries
    The Carpentries teach foundational coding, data science and computational skills to researchers worldwide. The Carpentries develop and teach in-person, interactive, two-day workshops using open-source lessons available on GitHub.

Relevant Publications

2021

  • Bittremieux W, Bouyssie D, Dorfer V, Locard-Paulet M, Perez-Riverol Y, Schwammle V, Uszkoreit J, Van Den Bossche T. 2021. The European Bioinformatics Community for Mass Spectrometry (EuBIC-MS): An open community for bioinformatics training and research. Rapid Commun Mass Spectrom; doi:10.1002/rcm.9087 [Online 16 April 2021]. [Abstract]
  • Dill-McFarland KA, Konig SG, Mazel F, Oliver DC, McEwen LM, Hong KY, Hallam SJ. 2021. An integrated, modular approach to data science education in microbiology. PLoS Comput Biol 17(2):e1008661. [Full Text]

2020

  • Pittard WS, Li S. 2020. The essential toolbox of data science: Python, R, Git, and Docker. Methods Mol Biol 2104:265-311. [Abstract]
  • Zhang Y, Ives ZG. 2020. Finding related tables in data lakes for interactive data science. Proc ACM SIGMOD Int Conf Manag Data 2020:1951-1966. [Full Text]

2019

  • Attwood TK, Blackford S, Brazas MD, Davies A, Schneider MV. 2019. A global perspective on evolving bioinformatics and data science training needs. Brief Bioinform 20(2):398-404. [Abstract]
  • Grabowski P, Rappsilber J. 2019. A primer on data analytics in functional genomics: how to move from data to insight? Trends Biochem Sci 44(1):21-32. [Abstract]
  • Mendez KM, Pritchard L, Reinke SN, Broadhurst DI. 2019. Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing. Metabolomics 15(10):125. [Full Text]

2018

  • Carey MA, Papin JA. 2018. Ten simple rules for biologists learning to program. PLoS Comput Biol 14(1):e1005871. [Abstract]
  • Huppenkothen D, Arendt A, Hogg DW, Ram K, VanderPlas JT, Rokem A. 2018. Hack weeks as a model for data science education and collaboration. Proc Natl Acad Sci U S A 115(36):8872-8877. [Full Text]
  • National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. [Full Text]
  • Toelch U, Ostwald D. 2018. Digital open science–teaching digital tools for reproducible and transparent research. PLoS Biol 16(7):e2006022. [Abstract]
  • Van Horn JD, Fierro L, Kamdar J, Gordon J, Stewart C, Bhattrai A, Abe S, Lei X, O'Driscoll C, Sinha A, Jain P, Burns G, Lerman K, Ambite JL. 2018. Democratizing data science through data science training. Pac Symp Biocomput 23:292-303. [Abstract]

2017

  • Dunn MC, Bourne PE. 2017. Building the biomedical data science workforce. PLoS Biol 15(7):e2003082. [Abstract]
Back
to Top