Superfund Research Program
This page provides information and resources on data sharing, data repositories, citing data, data integration, and data science training.
Data Sharing
Sharing data enables reuse, increases transparency, and facilitates reproducibility of research results. Learn how to share data with FAIR principles in mind
Data Repositories
Repositories accept submission of data to store, organize, validate, archive, preserve and distribute the data. In general, NIH does not endorse or require sharing in any specific repository and encourages researchers to select the repository that is most appropriate for their data type and discipline. Find a listing a repositories that may be appropriate for your data.
Data Citations
Referencing data products from research gives researchers credit for their work & allows others to find it. Learn how to cite your data.
Research Data Management
Organizing, storing, protecting, preserving, and processing data requires proper planning and maintenance. Learn ways to effectively manage data and find management tools.
Metadata Standards/ Ontologies/ Data Integration
Data that provide additional information intended to make scientific data interpretable and reusable is important for integrating data across projects and disciplines. Learn how to describe your data and find discipline-specific descriptors.
Data Science Training
Preparing, processing, and analyzing data is essential to extract meaningful information from research results. Find trainings related to data science.
Data Sharing
-
FAIR Data Principles
The FAIR Data Principles are a set of guiding principles published by FORCE11 that provide a framework for sharing data in a way that maximizes use and reuse by making data findable, accessible, interoperable, and reusable. Wilkinson et al., 2016 introduces the FAIR principles and rationale. -
FAIR Self-Assessment Tool
Created by the Australian Research Data Commons, this online tool allows one to assess the "FAIRness" of a dataset and provides suggestions for improvement and links to more information. -
FAIRShake
A software toolkit developed for the NIH Data Commons Pilot Phase Consortium (DCPPC) to enable the assessment of compliance of biomedical digital research objects with the FAIR guiding principles. FAIRShake was developed to facilitate the establishment of community-driven FAIR metrics and rubrics paired with manual and automated FAIR assessments. -
FAIRsharing
FAIRsharing.org provides a curated, informative, and educational resource on data and metadata standards and data policies, including those linked to specific funders and journals. Use it to identify and cite the standards, databases, or repositories that exist for your data and discipline. -
How To Make Your Data FAIR
Research guide produced by OpenAIRE, a European organization that seeks to shift scholarly communications toward openness and transparency. The guide includes a checklist to determine the "FAIRness" of one’s data. -
National Library of Medicine Strategic Plan (2017 – 2027)
The NLM strategic plan sets a course for data-driven discovery and health. -
NIH Strategic Plan for Data Science
NIH released its first Strategic Plan for Data Science in June 2018. It provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem. -
SPARC Data Sharing Requirements by Federal Agency
SPARC is a global coalition committed to making "open" the default status for research and education. Its website contains a resource for tracking, comparing, and understanding both current and future U.S. federal funder requirements for sharing research data.
Relevant Publications
- Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. 2023. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 19(2):11. [Full Text]
- García-Closas M, Ahearn TU, Gaudet MM, Hurson AN, Balasubramanian JB, Choudhury PP, Gerlanc NM, Patel B, Russ D, Abubakar M, Freedman ND, Wong WSW, Chanock SJ, de Gonzalez AB, Almeida JS. 2023. Moving towards FAIR practices in epidemiological research. Am J Epidemiol 192(6):995-1005. [Full Text]
- Sinaci AA, Gencturk M, Teoman HA, Laleci Erturkmen GB, Alvarez-Romero C, Martinez-Garcia A, Poblador-Plou B, Carmona-Pírez J, Löbe M, Parra-Calderon CL. 2023. A data transformation methodology to create findable, accessible, interoperable, and reusable health data: software design, development, and evaluation study. J Med Internet Res 25:e42822. [Full Text]
- 2023. Data sharing is the future. Nat Methods 20(4):471. [Full Text]
- Wang X, Wang Y, Ambite JL, Appaji A, Lander H, Moore SM, Rajasekar AK, Turner JA, Turner MD, Wang L, Sahoo SS. 2023. Enabling scientific reproducibility through FAIR data management: an ontology-driven deep learning approach in the NeuroBridge Project. AMIA Annu Symp Proc 2022:1135-1144. [Full Text]
- Bose N, Brookes AJ, Scordis P, Visser PJ. 2022. Data and sample sharing as an enabler for large-scale biomarker research and development: The EPND perspective. Front Neurol 13:1031091. [Full Text]
- Iturbide M, Fernández J, Gutiérrez JM, Pirani A, Huard D, Al Khourdajie A, Baño-Medina J, Bedia J, Casanueva A, Cimadevilla E, Cofiño AS, De Felice M, Diez-Sierra J, García-Díez M, Goldie J, Herrera DA, Herrera S, Manzanas R, Milovac J, Radhakrishnan A, San-Martín D, Spinuso A, Thyng KM, Trenham C, Yelekçi Ö. 2022. Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository. Sci Data 9(1):629. [Full Text]
- LeDuc RD, Deutsch EW, Binz PA, Fellers RT, Cesnik AJ, Klein JA, Van Den Bossche T, Gabriels R, Yalavarthi A, Perez-Riverol Y, Carver J, Bittremieux W, Kawano S, Pullman B, Bandeira N, Kelleher NL, Thomas PM, Vizcaíno JA. 2022. Proteomics Standards Initiative's ProForma 2.0: unifying the encoding of proteoforms and peptidoforms. J Proteome Res 21(4):1189-1195. [Abstract]
- Nature Cancer. 2021. All about data sharing. Nat Cancer 2(5):475. [Full Text]
- Neumann J. 2022. FAIR data infrastructure. Adv Biochem Eng Biotechnol 182:195-207. [Abstract]
- Riccardi D, Trautt Z, Bazyleva A, Paulechka E, Diky V, Magee JW, Kazakov AF, Townsend SA, Muzny CD. 2022. Towards improved FAIRness of the ThermoML Archive. J Comput Chem; doi: 10.1002/jcc.26842 [Online 24 March 2022]. [Abstract]
- Sabatello M, Martschenko DO, Cho MK, Brothers KB. 2022. Data sharing and community-engaged research. Science 378(6616):141-143. [Abstract]
- Sun Q, Nematbakhsh A, Kuntala PK, Kellogg G, Pugh BF, Lai WKM. 2022. STENCIL: a web templating engine for visualizing and sharing life science datasets. PLoS Comput Biol 18(2):e1009859. [Full Text]
- Wagner AS, Waite LK, Wierzba M, Hoffstaedter F, Waite AQ, Poldrack B, Eickhoff SB, Hanke M. 2022. FAIRly big: a framework for computationally reproducible processing of large-scale data. Sci Data 9(1):80. [Full Text]
- Williams J. 2022. CyVerse for reproducible research: RNA-Seq analysis. Methods Mol Biol 2443:57-79. [Full Text]
- Anthony N, Pellen C, Ohmann C, Moher D, Naudet F. 2021. Social media attention and citations of published outputs from re-use of clinical trial data: a matched comparison with articles published in the same journals. BMC Med Res Methodol 21(1):119. [ Full Text]
- Austin CC, Bernier A, Bezuidenhout L, Bicarregui J, Biro T, Cambon-Thomsen A, Carroll SR, Cournia Z, Dabrowski PW, Diallo G, Duflot T, Garcia L, Gesing S, Gonzalez-Beltran A, Gururaj A, Harrower N, Lin D, Medeiros C, Méndez E, Meyers N, Mietchen D, Nagrani R, Nilsonne G, Parker S, Pickering B, Pienta A, Polydoratou P, Psomopoulos F, Rennes S, Rowe R, Sansone SA, Shanahan H, Sitz L, Stocks J, Tovani-Palone MR, Uhlmansiek M. 2021. Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group. Wellcome Open Res 5:267. [ Full Text]
- Bhattacharya S, Hu Z, Butte AJ. 2021. Opportunities and challenges in democratizing immunology datasets. Front Immunol 12:647536. [ Full Text]
- Bittremieux W, Adams C, Laukens K, Dorrestein PC, Bandeira N. 2021. Open science resources for the mass spectrometry-based analysis of SARS-CoV-2. J Proteome Res 20(3):1464-1475. [ Full Text]
- Butte AJ. 2021. Trials and tribulations-11 reasons why we need to promote clinical trials data sharing. JAMA Netw Open 4(1):e2035043. [ Full Text]
- Canakoglu A, Pinoli P, Gulino A, Nanni L, Masseroli M, Ceri S. 2021. Federated sharing and processing of genomic datasets for tertiary data analysis. Brief Bioinform 22(3):bbaa091. [ Abstract]
- Carroll SR, Herczog E, Hudson M, Russell K, Stall S. 2021. Operationalizing the CARE and FAIR principles for indigenous data futures. Sci Data 8(1):108. [ Full Text]
- Caufield JH, Fu J, Wang D, Guevara-Gonzalez V, Wang W, Ping P. 2021. A second look at FAIR in proteomic investigations. J Proteome Res; doi:10.1021/acs.jproteome.1c00177 [Online 13 March 2021]. [ Abstract]
- Chan V, Gherardini PF, Krummel MF, Fragiadakis GK. 2021. A "data sharing trust" model for rapid, collaborative science. Cell 184(3):566-570. [ Abstract]
- Danchev V, Min Y, Borghi J, Baiocchi M, Ioannidis JPA. 2021. Evaluation of data sharing after implementation of the international committee of medical journal editors’ data sharing statement requirement. JAMA Netw Open 4(1):e2033972. [ Full Text]
- de Macena Sobreira NL, Hamosh A. 2021. Next-generation sequencing and the evolution of data sharing. Am J Med Genet A; doi:10.1002/ajmg.a.62239 [Online 7 May 2021]. [ Abstract]
- Devriendt T, Shabani M, Borry P. 2021. Data sharing in biomedical sciences: a systematic review of incentives. Biopreserv Biobank; doi:10.1089/bio.2020.0037 [Online 11 February 2021]. [ Abstract]
- Gencturk M, Teoman A, Alvarez-Romero C, Martinez-Garcia A, Parra-Calderon CL, Poblador-Plou B, Löbe M, Sinaci AA. 2021. End user evaluation of the FAIR4Health data curation tool. Stud Health Technol Inform 281:8-12. [Full Text]
- Harrison PW, Lopez R, Rahman N, Allen SG, Aslam R, Buso N, Cummins C, Fathy Y, Felix E, Glont M, Jayathilaka S, Kadam S, Kumar M, Lauer KB, Malhotra G, Mosaku A, Edbali O, Park YM, Parton A, Pearce M, Estrada Pena JF, Rossetto J, Russell C, Selvakumar S, Sitjà XP, Sokolov A, Thorne R, Ventouratou M, Walter P, Yordanova G, Zadissa A, Cochrane G, Blomberg N, Apweiler R. 2021. The COVID-19 Data Portal: accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing. Nucleic Acids Res; doi:10.1093/nar/gkab417 [Online 28 May 2021]. [ Full Text]
- Hollmann S, Kremer A, Baebler Š, Trefois C, Gruden K, Rudnicki WR, Tong W, Gruca A, Bongcam-Rudloff E, Evelo CT, Nechyporenko A, Frohme M, Šafránek D, Regierer B, D'Elia D. 2021. The need for standardisation in life science research - an approach to excellence and trust. F1000Res 9:1398. [ Full Text]
- Inau ET, Sack J, Waltemath D, Zeleke AA. 2021. Initiatives, concepts, and implementation practices of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in health data stewardship practice: protocol for a scoping review. JMIR Res Protoc 10(2):e22505. [Full Text]
- Kamdar MR, Musen MA. 2021. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8(1):24. [ Full Text]
- Kinkade D and Shepherd A. 2021. Geoscience data publication: Practices and perspectives on enabling the FAIR guiding principles. Geoscience Data Journal. [ Full Text]
- Lach D, Zhdan U, Smolinski A, Polanski J. 2021. Functional and material properties in nanocatalyst design: a data handling and sharing problem. Int J Mol Sci 22(10):5176. [ Full Text]
- Lochman JE. 2021. Open science and intervention research: a program developer's and researcher's perspective on issues and concerns. Prev Sci; doi:10.1007/s11121-021-01219-6 [Online 2 March 2021]. [Abstract]
- Mahmud M, Kaiser MS, McGinnity TM, Hussain A. 2021. Deep learning in mining biological data. Cognit Comput; doi: 10.1007/s12559-020-09773-x [Online 5 January 2021]. [ Full Text]
- McGuinness LA, Sheppard AL. 2021. A descriptive analysis of the data availability statements accompanying medRxiv preprints and a comparison with their published counterparts. PLoS One 16(5):e0250887. [Full Text]
- Misra BB. 2021. Advances in high resolution GC-MS technology: a focus on the application of GC-Orbitrap-MS in metabolomics and exposomics for FAIR practices. Anal Methods; doi:10.1039/d1ay00173f [Online 14 May 2021]. [Abstract]
- Rahimzadeh V, Bartlett G, Knoppers BM. 2021. A policy Delphi study to validate the Key Implications of Data Sharing (KIDS) framework for pediatric genomics in Canada. BMC Med Ethics 22(1):71. [ Full Text]
- Richir J, Bray S, McAleese T, Watson GJ. 2021. Data on elemental concentrations in marine sediments from the South and South West of England. Data Brief 35:106901. [ Full Text]
- Rzepa HS, Kuhn S. 2021. A data-oriented approach to making new molecules as a student experiment: AI-enabling FAIR publication of NMR data for organic esters. Magn Reson Chem; doi:10.1002/mrc.5186 [Online 9 June 2021]. [Full Text]
- Serghiou S, Contopoulos-Ioannidis DG, Boyack KW, Riedel N, Wallach JD, Ioannidis JPA. 2021. Assessment of transparency indicators across the biomedical literature: how open is open? PLoS Biol 19(3):e3001107. [Full Text]
- Sixto-Costoya A, Lucas-Dominguez R, Aleixandre-Benavent R, Vidal-Infer A. 2021. Is sharing datasets the answer to the new challenges of reproductive biology research? Reprod Sci 28(4):1023-1025. [Full Text]
- Stingone JA, Triantafillou S, Larsen A, Kitt JP, Shaw GM, Marsillach J. 2021. Interdisciplinary data science to advance environmental health research and improve birth outcomes. Environ Res 197:111019. [Abstract]
- Vuorre M, Crump MJC. 2021. Sharing and organizing research products as R packages. Behav Res Methods 53(2):792-802. [ Full Text]
- Zuo X, Chen Y, Ohno-Machado L, Xu H. 2021. How do we share data in COVID-19 research? A systematic review of COVID-19 datasets in PubMed Central Articles. Brief Bioinform 22(2):800-811. [ Full Text]
- Birkenbihl C, Salimi Y, Domingo-Fernandez D, Lovestone S, AddNeuroMed consortium, Frohlich H, Hofmann-Apitius M, Japanese Alzheimer's Disease Neuroimaging Initiative, Alzheimer's Disease Neuroimaging Initiative. 2020. Evaluating the Alzheimer's disease data landscape. Alzheimers Dement (N Y) 6(1):e12102. [ Full Text]
- Boronow KE, Perovich LJ, Sweeney L, Yoo JS, Rudel RA, Brown P, Brody JG. 2020. Privacy risks of sharing data from environmental health studies. Environ Health Perspect 128(1):17008. [ Abstract]
- Boulware LE, Harris GB, Harewood P, Johnson FF, Maxson P, Bhavsar N, Blackwelder SS, Poley SS, Arnold K, Akindele B, Ferranti J, Lyn M. 2020. Democratizing health system data to impact social and environmental health contexts: a novel collaborative community data-sharing model. J Public Health (Oxf); doi:10.1093/pubmed/fdz171 [Online 9 January 2020]. [Abstract]
- Dorne JLCM, Richardson J, Livaniou A, Carnesecchi E, Ceriani L, Baldin R, Kovarich S, Pavan M, Saouter E, Biganzoli F, Pasinato L, Zare Jeddi M, Robinson TP, Kass GEN, Liem AKD, Toropov AA, Toropova AP, Yang C, Tarkhov A, Georgiadis N, Di Nicola MR, Mostrag A, Verhagen H, Roncaglioni A, Benfenati E, Bassan A. 2020. EFSA's OpenFoodTox: an open source toxicological database on chemicals in food and feed and its future developments. Environ Int 146:106293. [ Full Text]
- Gao F, Tao L, Huang Y, Shu Z. 2020. Management and data sharing of COVID-19 pandemic information. Biopreserv Biobank 18(6):570-580. [ Full Text]
- Hauptmann E. 2020. Why they shared: recovering early arguments for sharing social scientific data. Sci Context 33(2):101-119. [ Abstract]
- Heacock ML, Amolegbe SM, Skalla LA, Trottier BA, Carlin DJ, Henry HF, Lopez AR, Duncan CG, Lawler CP, Balshaw DM, Suk WA. 2020. Sharing SRP data to reduce environmentally associated disease and promote transdisciplinary research. Rev Environ Health; doi:10.1515/reveh-2019-0089 [Online 3 Mar 2020]. [ Full Text]
- Helzlsouer K, Meerzaman D, Taplin S, Dunn BK. 2020. Humanizing big data: recognizing the human aspect of big data. Front Oncol 10:186. [ Full Text]
- Hodgson S, Fecht D, Gulliver J, Iyathooray Daby H, Piel FB, Yip F, Strosnider H, Hansell A, Elliott P. 2020. Availability, access, analysis and dissemination of small-area data. Int J Epidemiol 49(Supplement_1):i4–i14. [Full Text]
- Jakob CEM, Kohlmayer F, Meurers T, Vehreschild JJ, Prasser F. 2020. Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19. Sci Data 7(1):435. [ Full Text]
- Kaewkungwal J, Adams P, Sattabongkot J, Lie RK, Wendler D. 2020. Issues and challenges associated with data-sharing in low- and middle-income countries: perspectives of researchers in Thailand. Am J Trop Med Hyg; doi:10.4269/ajtmh.19-0651 [Online 11 May 2020]. [Full Text]
- Lacey JV, Chung NT, Hughes P, Benbow JL, Duffy C, Savage KE, Spielfogel ES, Wang SS, Martinez ME, Chandra S. 2020. Insights from adopting a data commons approach for large-scale observational cohort studies: the California Teachers Study. Cancer Epidemiol Biomarkers Prev; doi:10.1158/1055-9965.EPI-19-0842 [Online 12 February 2020]. [Abstract]
- Lin D, Crabtree J, Dillo I, Downs RR, Edmunds R, Giaretta D, De Giust M, L'Hours H, Hugo W, Jenkyns R, Khodiyar V, Martone ME, Mokrane M, Navale V, Petters J, Sierman B, Sokolova DV, Stockhause M, Westbrook J. 2020. The TRUST Principles for Digital Repositories. Sci Data 7(144). [ Full Text]
- Lobe M, Matthies F, Staubert S, Meineke FA, Winter A. 2020. Problems in FAIRifying medical datasets. Stud Health Technol Inform 270:392-396. [ Full Text]
- Luthria G, Wang Q. 2020. Implementing a cloud based method for protected clinical trial data sharing. Pac Symp Biocomput 25:647-658. [ Full Text]
- Matheson GJ, Plaven-Sigray P, Tuisku J, Rinne J, Matuskey D, Cervenka S. 2020. Clinical brain PET research must embrace multi-centre collaboration and data sharing or risk its demise. Eur J Nucl Med Mol Imaging 47(2):502-504. [Full Text]
- Merz KM Jr, Amaro R, Cournia Z, Rarey M, Soares T, Tropsha A, Wahab HA, Wang R. 2020. Editorial: method and data sharing and reproducibility of scientific results. J Chem Inf Model 60(12):5868-5869. [Full Text]
- Mons B. 2020. Invest 5% of research funds in ensuring data are reusable. Nature 578(7796):491. [ Full Text]
- National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. [Full Text]
- Paprica PA, Sutherland E, Smith A, Brudno M, Cartagena RG, Crichlow M, Courtney BK, Loken C, McGrail KM, Ryan A, Schull MJ, Thorogood A, Virtanen C, Yang K. 2020. Essential requirements for establishing and operating data trusts: practical guidance co-developed by representatives from fifteen Canadian organizations and initiatives. Int J Popul Data Sci 5(1):1353. [Full Text]
- Perrier L, Blondal E, MacDonald H. 2020. The views, perspectives, and experiences of academic researchers with data sharing and reuse: a meta-synthesis. PLoS One 15(2):e0229182. [ Full Text]
- Phillips M, Molnar-Gabor F, Korbel JO, Thorogood A, Joly Y, Chalmers D, Townend D, Knoppers BM. 2020. Genomics: data sharing needs an international code of conduct. Nature 578(7793):31-33. [ Full Text]
- Promoting best practice in nucleotide sequence data sharing. 2020. Sci Data 7(1):152. [ Full Text]
- Rios RS, Zheng KI, Zheng MH. 2020. Data sharing during COVID-19 pandemic: what to take away. Expert Rev Gastroenterol Hepatol; doi:10.1080/17474124.2020.1815533 [Online 26 August 2020]. [ Full Text]
- Sinaci AA, Núñez-Benjumea FJ, Gencturk M, Jauer ML, Deserno T, Chronaki C, Cangioli G, Cavero-Barca C, Rodríguez-Pérez JM, Pérez-Pérez MM, Laleci Erturkmen GB, Hernández-Pérez T, Méndez-Rodríguez E, Parra-Calderón CL. 2020. From Raw Data to FAIR Data: The FAIRification Workflow for Health Research. Methods Inf Med 59(S 01):e21-e32. [Full Text]
- Smith CD, Mennis J. 2020. Incorporating geographic information science and technology in response to the COVID-19 pandemic. Prev Chronic Dis 17:E58. [ Full Text]
- Smith CM, Kadin JA, Baldarelli RM, Beal JS, Blodgett O, Giannatto SC, Richardson JE, Ringwald M. 2020. GXD's RNA-Seq and microarray experiment search: using curated metadata to reliably find mouse expression studies of interest. Database (Oxford) 2020:baaa002. [ Full Text]
- Solle D. 2020. Be FAIR to your data. Anal Bioanal Chem 412(17):3961-3965. [ Full Text]
- Tang L. 2020. FAIR your data. Nat Methods 17(2):127. [Full Text]
- Tenopir C, Rice NM, Allard S, Baird L, Borycz J, Christian L, Grant B, Olendorf R, Sandusky RJ. 2020. Data sharing, management, use, and reuse: practices and perceptions of scientists worldwide. PLoS One 15(3):e0229003. [ Full Text]
- Thelwall M, Munafò M, Mas-Bleda A, Stuart E, Makita M, Weigert V, Keene C, Khan N, Drax K, Kousha K. 2020. Is useful research data usually shared? An investigation of genome-wide association study summary statistics. PLoS One 15(2):e0229578. [ Full Text]
- Udesky JO, Boronow KE, Brown P, Perovich LJ, Brody JG. 2020. Perceived risks, benefits, and interest in participating in environmental health studies that share personal exposure data: a U.S. survey of prospective participants. J Empir Res Hum Res Ethics; doi:10.1177/1556264620903595. [Online 15 February 2020]. [Abstract]
- White T, Blok E, Calhoun VD. 2020. Data sharing and privacy issues in neuroimaging research: opportunities, obstacles, challenges, and monsters under the bed. Hum Brain Mapp; doi:10.1002/hbm.25120 [Online 4 July 2020]. [Full Text]
- Barba A, Dominguez S, Cobas C, Martinsen DP, Romain C, Rzepa HS, Seoane F. 2019. Workflows allowing creation of journal article supporting information and Findable, Accessible, Interoperable, and Reusable (FAIR)-enabled publication of spectroscopic data. ACS Omega 4(2):3280-3286. [Abstract]
- Biomedical Data Translator Consortium. 2019. The biomedical data translator program: conception, culture, and community. Clin Transl Sci 12(2):91-94. [ Full Text]
- Carbon S, Champieux R, McMurry JA, Winfree L, Wyatt LR, Haendel MA. 2019. An analysis and metric of reusable data licensing practices for biomedical resources. PLoS One 14(3):e0213090. [ Abstract]
- Celi LA, Citi L, Ghassemi M, Pollard TJ. 2019. The PLOS ONE collection on machine learning in health and biomedicine: towards open code and open data. PLoS One 14(1):e0210232. [ Abstract]
- Christensen G, Dafoe A, Miguel E, Moore DA, Rose AK. 2019. A study of the impact of data sharing on article citations using journal policies as a natural experiment. PLoS One 14(12):e0225883. [ Abstract]
- Clarke DJB, Wang L, Jones A, Wojciechowicz ML, Torre D, Jagodnik KM, Jenkins SL, McQuilton P, Flamholz Z, Silverstein MC, Schilder BM, Robasky K, Castillo C, Idaszak R, Ahalt SC, Williams J, Schurer S, Cooper DJ, de Miranda Azevedo R, Klenk JA, Haendel MA, Nedzel J, Avillach P, Shimoyama ME, Harris RM, Gamble M, Poten R, Charbonneau AL, Larkin J, Brown CT, Bonazzi VR, Dumontier MJ, Sansone SA, Ma'ayan A. 2019. FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Syst 9(5):417-421. [ Abstract]
- Fecho K, Ahalt SC, Arunachalam S, Champion J, Chute CG, Davis S, Gersing K, Glusman G, Hadlock J, Lee J, Pfaff E, Robinson M, Sid E, Ta C, Xu H, Zhu R, Zhu Q, Peden DB, Biomedical Data Translator Consortium. 2019. Sex, obesity, diabetes, and exposure to particulate matter among patients with severe asthma: scientific insights from a comparative analysis of open clinical data sources during a five-day hackathon . J Biomed Inform 100:103325. [Abstract]
- Fothergill BT, Knight W, Stahl BC, Ulnicane I. 2019. Responsible data governance of neuroscience big data. Front Neuroinform 13:28. [ Abstract]
- Gaffney SG, Ad O, Smaga S, Schepartz A, Townsend JP. 2019. GEM-NET: lessons in multi-institution teamwork using collaboration software. ACS Cent Sci 5(7):1159-1169. [ Abstract]
- Jansen P, van den Berg L, van Overveld P, Boiten JW. 2019. Research data stewardship for healthcare professionals. In: Fundamentals of Clinical Data Science (Kubben P, Dumontier M, Dekker A, eds.). Cham, Switzerland: Springer. [Full Text]
- Li R, Sim I. 2019. How clinical trial data sharing platforms can advance the study of biomarkers. J Law Med Ethics 47(3):369-373. [ Abstract]
- Li X, Fireman BH, Curtis JR, Arterburn DE, Fisher DP, Moyneur E, Gallagher M, Raebel MA, Nowell WB, Lagreid L, Toh S. 2019. Validity of privacy-protecting analytical methods that use only aggregate-level information to conduct multivariable-adjusted analysis in distributed data networks. Am J Epidemiol 188(4):709-723. [Abstract]
- Madduri R, Chard K, D'Arcy M, Jung SC, Rodriguez A, Sulakhe D, Deutsch E, Funk C, Heavner B, Richards M, Shannon P, Glusman G, Price N, Kesselman C, Foster I. 2019. Reproducible big data science: a case study in continuous FAIRness. PLoS One 14(4):e0213013. [ Abstract]
- Oliveira JL, Trifan A, Bastiao Silva LA. 2019. EMIF Catalogue: a collaborative platform for sharing and reusing biomedical data. Int J Med Inform 126:35-45. [ Abstract]
- Perez-Riverol Y, Zorin A, Dass G, Vu MT, Xu P, Glont M, Vizcaino JA, Jarnuczak AF, Petryszak R, Ping P, Hermjakob H. 2019. Quantifying the impact of public omics data. Nat Commun 10(1):3512. [ Abstract]
- Polanin JR, Terzian M. 2019. A data-sharing agreement helps to increase researchers' willingness to share primary data: results from a randomized controlled trial. J Clin Epidemiol 106:60-69. [ Abstract]
- Popkin G. 2019. Data sharing and how it can benefit your scientific career. Nature 569(7756):445-447. [ Abstract]
- Psaty BM, Rich SS, Boerwinkle E. 2019. Innovation in genomic data sharing at the NIH. N Engl J Med 380(23):2192-2195. [ Abstract]
- Resnik DB, Morales M, Landrum R, Shi M, Minnier J, Vasilevsky NA, Champieux RE. 2019. Effect of impact factor and discipline on journal data sharing policies. Account Res 26(3):139-156. [ Abstract]
- Ruhamyankaka E, Brunk BP, Dorsey G, Harb OS, Helb DA, Judkins J, Kissinger JC, Lindsay B, Roos DS, San EJ, Stoeckert CJ, Zheng J, Tomko SS. 2019. ClinEpiDB: an open-access clinical epidemiology database resource encouraging online exploration of complex studies. Gates Open Res 3:1661. [ Abstract]
- Sansone SA, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, Thurston M. 2019. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol 2019;37(4):358-367. [Full Text]
- Spoor S, Cheng CH, Sanderson LA, Condon B, Almsaeed A, Chen M, Bretaudeau A, Rasche H, Jung S, Main D, Bett K, Staton M, Wegrzyn JL, Feltus FA, Ficklin SP. 2019. Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases. Database (Oxford) 2019. [ Abstract]
- Staley J, Mazloom R, Lowe P, Newsum CT, Jaberi-Douraki M, Riviere J, Wyckoff GJ. 2019. Novel data sharing agreement to accelerate big data translational research projects in the one health sphere. Top Companion Anim Med 37:100367. [ Abstract]
- Vesteghem C, Brondum RF, Sonderkaer M, Sommer M, Schmitz A, Bodker JS, Dybkaer K, El-Galaly TC, Bogsted M. 2019. Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives. Brief Bioinform; doi:10.1093/bib/bbz044 [Online 29 June 2019]. [ Abstract]
- Villanueva AG, Cook-Deegan R, Koenig BA, Deverka PA, Versalovic E, McGuire AL, Majumder MA. 2019. Characterizing the biomedical data-sharing landscape. J Law Med Ethics 47(1):21-30. [ Abstract]
- Xu H, Zhang N. 2019. Privacy in health disparity research. Med Care 57(Suppl 6 Suppl 2):S172–S175. [ Full Text]
- Boeckhout M, Zielhuis GA, Bredenoord AL. 2018. The FAIR guiding principles for data stewardship: fair enough? Eur J Hum Genet 26(7):931-936. [ Abstract]
- Brown AV, Campbell JD, Assefa T, Grant D, Nelson RT, Weeks NT, Cannon SB. 2018. Ten quick tips for sharing open genomic data. PLoS Comput Biol 14(12):e1006472. [ Abstract]
- Escribano N, Galicia D, Arino AH. 2018. The tragedy of the biodiversity data commons: a data impediment creeping nigher? Database (Oxford) 2018. [ Abstract]
- Gruning B, Chilton J, Koster J, Dale R, Soranzo N, van den Beek M, Goecks J, Backofen R, Nekrutenko A, Taylor J. 2018. Practical computational reproducibility in the life sciences. Cell Syst 6(6):631-635. [Abstract]
- Holub P, Kohlmayer F, Prasser F, Mayrhofer MT, Schlunder I, Martin GM, Casati S, Koumakis L, Wutte A, Kozera L, Strapagiel D, Anton G, Zanetti G, Sezerman OU, Mendy M, Valik D, Lavitrano M, Dagher G, Zatloukal K, van Ommen GB, Litton JE. 2018. Enhancing reuse of data and biological material in medical research: from FAIR to FAIR-Health. Biopreserv Biobank 16(2):97-105. [ Abstract]
- Kitzes J, Turek D, Deniz F, eds. 2018. The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [ Full Text]
- National Academies of Sciences, Engineering, and Medicine. 2018. Open Science by Design: Realizing a Vision for 21st Century Research. Washington, DC: The National Academies Press. [ Full Text]
- Navale V, McAuliffe M. 2018. Long-term preservation of biomedical research data. F1000Res 7:1353. [ Abstract]
- Perkel JM. 2018. A toolkit for data transparency takes shape. Nature 560(7719):513-515. [ Abstract]
- Yaniv Z, Lowekamp BC, Johnson HJ, Beare R. 2018. SimpleITK image-analysis notebooks: a collaborative environment for education and reproducible research. J Digit Imaging 31(3):290-303. [ Abstract]
- Ascoli GA, Maraver P, Nanda S, Polavaram S, Armananzas R. 2017. Win-win data sharing in neuroscience. Nat Methods 14(2):112-116. [ Abstract]
- Boland MR, Karczewski KJ, Tatonetti NP. 2017. Ten simple rules to enable multi-site collaborations through data sharing. PLoS Comput Biol 13(1):e1005278. [ Abstract]
- Fleming L, Tempini N, Gordon-Brown H, Nichols GL, Sarran C, Vineis P, Leonardi G, Golding B, Haines A, Kessel A, Murray V, Depledge M, Leonelli S. 2017. Big data in environment and human health. Oxford Research Encyclopedia of Environmental Science. [Full Text]
- Hudson KL, Collins FS. 2017. The 21st Century Cures Act - a view from the NIH. N Engl J Med 376(2):111-113. [ Abstract]
- Jimenez RC, Kuzak M, Alhamdoosh M, Barker M, Batut B, Borg M, Capella-Gutierrez S, Chue Hong N, Cook M, Corpas M, Flannery M, Garcia L, Gelpi JL, Gladman S, Goble C, Gonzalez Ferreiro M, Gonzalez-Beltran A, Griffin PC, Gruning B, Hagberg J, Holub P, Hooft R, Ison J, Katz DS, Leskosek B, Lopez Gomez F, Oliveira LJ, Mellor D, Mosbergen R, Mulder N, Perez-Riverol Y, Pergl R, Pichler H, Pope B, Sanz F, Schneider MV, Stodden V, Suchecki R, Svobodova Varekova R, Talvik HA, Todorov I, Treloar A, Tyagi S, van Gompel M, Vaughan D, Via A, Wang X, Watson-Haigh NS, Crouch S. 2017. Four simple recommendations to encourage best practices in research software. F1000Res 6. [ Abstract]
- Majumder MA, Guerrini CJ, Bollinger JM, Cook-Deegan R, McGuire AL. 2017. Sharing data under the 21st Century Cures Act. Genet Med 19(12):1289-1294. [ Abstract]
- McIntosh LD, Juehne A, Vitale CRH, Liu X, Alcoser R, Lukas JC, Evanoff B. 2017. Repeat: a framework to assess empirical reproducibility in biomedical research. BMC Med Res Methodol 17(1):143. [ Abstract]
- Ohmann C, Banzi R, Canham S, Battaglia S, Matei M, Ariyo C, Becnel L, Bierer B, Bowers S, Clivio L, Dias M, Druml C, Faure H, Fenner M, Galvez J, Ghersi D, Gluud C, Groves T, Houston P, Karam G, Kalra D, Knowles RL, Krleza-Jeric K, Kubiak C, Kuchinke W, Kush R, Lukkarinen A, Marques PS, Newbigging A, O'Callaghan J, Ravaud P, Schlunder I, Shanahan D, Sitter H, Spalding D, Tudur-Smith C, van Reusel P, van Veen EB, Visser GR, Wilson J, Demotes-Mainard J. 2017. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open 7(12):e018647. [ Abstract]
- Olfson M, Wall MM, Blanco C. 2017. Incentivizing data sharing and collaboration in medical research–the S-Index. JAMA Psychiatry 74(1):5-6. [ Abstract]
- Thelwall M, Kousha K. 2017. Do journal data sharing mandates work? Life sciences evidence from Dryad. ASLIB J Inform Manag 69(1):36-45. [ Abstract]
Data Repositories
-
Data Observation Network for Earth (DataONE)
DataONE is a community-driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data. DataONE provides data usage and citation metrics for datasets. -
DataCite
Search the DataCite registry for datasets, software, images, and other research material. -
DataMed
DataMed is a prototype biomedical data search engine being developed for the NIH BD2K Data Discovery Index (DDI) by the bioCaddie project team. It allows users to search and find data across different repositories. -
General Code Repositories
Use general code repositories such as Bitbucket, GitHub, and Source Forge to share code. CRAN is an R package archive network. Code Ocean is a research collaboration platform that lets users share computational environments and code in a Web browser. The Jupyter Notebook is an open-source web application for creating and sharing scientific data and text. It allows the user to create and share documents that contain live code, equations, visualizations, and text. -
NIH Data Sharing Repositories
This table lists NIH-supported data repositories that make data accessible for reuse. -
Omics Discovery Index (OmicsDI)
OmicsDI provides dataset discovery across a heterogeneous, distributed group of transcriptomics, genomics, proteomics, and metabolomics data resources spanning 16 repositories in three continents and six organizations, including both open- and controlled-access data resources. -
Registry of Research Data Repositories (re3data.org)
re3data.org is a global registry of research data repositories. It provides detailed information on more than 2,000 repositories to help researchers find the right one for their data. re3data.org is a service of DataCite, a global non-profit organization that provides DOIs for research data. -
SciCrunch
SciCrunch is a data sharing and display platform designed to help communities create their own portals to provide access to research resources, data, literature, and tools. Hosted at the University of California, San Diego, SciCrunch is home to the Drug Design Data Resource, NIDDK Information Network (National Institute of Diabetes and Digestive and Kidney Diseases), the Neuroscience Information Network, and the Research Resource Identifiers (RRID) Portal which provides shared identifiers for citing research resources such as cell lines or antibodies in the literature. Users can search within and across these community portals. -
United States Geological Survey (USGS) Data Repository Webpage
This webpage provides useful information about preserving environmental data in data repositories.
Relevant Publications
- Hosseinpoor AR, Kirkby K, Bergen N, Schlotheuber A, Antiporta DA. 2023. WHO releases Health Inequality Data Repository. Lancet 401(10388):1565-1566. [Full Text]
- Felden J, Möller L, Schindler U, Huber R, Schumacher S, Koppe R, Diepenbroek M, Glöckner FO. 2023. PANGEA – Data publisher for earth & environmental science. Sci Data 10(1):347. [Full Text]
- Bogue MA, Ball RL, Philip VM, Walton DO, Dunn MH, Kolishovski G, Lamoureux A, Gerring M, Liang H, Emerson J, Stearns T, He H, Mukherjee G, Bluis J, Desai S, Sundberg B, Kadakkuzha B, Kunde-Ramamoorthy G, Chesler EJ. 2023. Mouse Phenome Database: towards a more FAIR-compliant and TRUST-worthy data repository and tool suite for phenotypes and genotypes. Nucleic Acids Res 51(D1):D1067-D1074. [Full Text]
- Sollis E, Mosaku A, Abid A, Buniello A, Cerezo M, Gil L, Groza T, Güneş O, Hall P, Hayhurst J, Ibrahim A, Ji Y, John S, Lewis E, MacArthur JAL, McMahon A, Osumi-Sutherland D, Panoutsopoulou K, Pendlington Z, Ramachandran S, Stefancsik R, Stewart J, Whetzel P, Wilson R, Hindorff L, Cunningham F, Lambert SA, Inouye M, Parkinson H, Harris LW. 2023. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 51(D1):D977-D985. [Full Text]
- Ahmed Z, Renart EG, Mishra D, Zeeshan S. 2021. JWES: a new pipeline for whole genome/exome sequence data processing, management, and gene-variant discovery, annotation, prediction, and genotyping. FEBS Open Bio 11(9):2441-2452. [Full Text]
- Bonello J, Cachia E, Alfino N. 2022. AutoFAIR – a portal for automating FAIR assessments for bioinformatics resources. Biochim Biophys Acta Gene Regul Mech 1865(1):194767. [Abstract]
- Chen M, Ma Y, Wu S, Zheng X, Kang H, Sang J, Xu X, Hao L, Li Z, Gong Z, Xiao J, Zhang Z, Zhao W, Bao Y. 2021. Genome Warehouse: a public repository housing genome-scale data. Genomics Proteomics Bioinformatics 19(4):584-589. [Full Text]
- Freedman ND, Brown L, Newman LM, Jones JM, Benoit TJ, Averhoff F, Bu X, Bayrak K, Lu A, Coffey B, Jackson L, Chanock SJ, Kerlavage AR. 2022. COVID-19 SeroHub, an online repository of SARS-CoV-2 seroprevalence studies in the United States. Sci Data 9(1):727. [Full Text]
- Grissa D, Junge A, Oprea TI, Jensen LJ. 2022. Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration. Database (Oxford) 2022:baac019. [Full Text]
- Perez-Riverol Y. 2022. Proteomic repository data submission, dissemination, and reuse: key messages. Expert Rev Proteomics: 1-14. [Abstract]
- Altenhoff AM, Train CM, Gilbert KJ, Mediratta I, Mendes de Farias T, Moi D, Nevers Y, Radoykova HS, Rossier V, Warwick Vesztrocy A, Glover NM, Dessimoz C. 2021. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res 49(D1):D373-D379. [ Full Text]
- Bastian FB, Roux J, Niknejad A, Comte A, Fonseca Costa SS, de Farias TM, Moretti S, Parmentier G, de Laval VR, Rosikiewicz M, Wollbrett J, Echchiki A, Escoriza A, Gharib WH, Gonzales-Porta M, Jarosz Y, Laurenczy B, Moret P, Person E, Roelli P, Sanjeev K, Seppey M, Robinson-Rechavi M. 2021. The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals. Nucleic Acids Res 49(D1):D831-D847. [Full Text]
- Blake JA, Baldarelli R, Kadin JA, Richardson JE, Smith CL, Bult CJ, Mouse Genome Database Group. 2021. Mouse genome database (MGD): knowledgebase for mouse-human comparative biology. Nucleic Acids Res 49(D1):D981-D987. [Full Text]
- Dorne JLCM, Richardson J, Livaniou A, Carnesecchi E, Ceriani L, Baldin R, Kovarich S, Pavan M, Saouter E, Biganzoli F, Pasinato L, Zare Jeddi M, Robinson TP, Kass GEN, Liem AKD, Toropov AA, Toropova AP, Yang C, Tarkhov A, Georgiadis N, Di Nicola MR, Mostrag A, Verhagen H, Roncaglioni A, Benfenati E, Bassan A. 2021. EFSA's OpenFoodTox: an open source toxicological database on chemicals in food and feed and its future developments. Environ Int 146:106293. [ Full Text]
- Gilbertson PK, Forrester S, Andrews L, McCann K, Rogers L, Park C, Moye J. 2021. The National Children's Study Archive model: A 3-tier framework for dissemination of data and specimens for general use and secondary analysis. Front Public Health 9:526286. [ Full Text]
- Kasmanas JC, Bartholomaus A, Correa FB, Tal T, Jehmlich N, Herberth G, von Bergen M, Stadler PF, Carvalho ACPLF, Nunes da Rocha U. 2021. HumanMetagenomeDB: a public repository of curated and standardized metadata for human metagenomes. Nucleic Acids Res 49(D1):D743-D750. [ Full Text]
- Lim N, Tesar S, Belmadani M, Poirier-Morency G, Mancarci BO, Sicherman J, Jacobson M, Leong J, Tan P, Pavlidis P. 2021. Curation of over 10,000 transcriptomic studies to enable data reuse. Database (Oxford) 2021:baab006. [ Full Text]
- Martens M, Ammar A, Riutta A, Waagmeester A, Slenter DN, Hanspers K, A Miller R, Digles D, Lopes EN, Ehrhart F, Dupuis LJ, Winckers LA, Coort SL, Willighagen EL, Evelo CT, Pico AR, Kutmon M. 2021. WikiPathways: connecting communities. Nucleic Acids Res 49(D1):D613-D621. [ Full Text]
- Moretti S, Tran VDT, Mehl F, Ibberson M, Pagni M. 2021. MetaNetX/MNXref: unified namespace for metabolites and biochemical reactions in the context of metabolic models. Nucleic Acids Res 49(D1):D570-D574. [Full Text]
- Ortolani C, D'Atri M, Zamai L, Canonico B, Del Zotto G, Papa S. 2021. ESCCABase project: a repository in progress. Cytometry A; doi:10.1002/cyto.a.24355 [Online 7 May 2021]. [ Full Text]
- Watanabe Y, Yoshizawa AC, Ishihama Y, Okuda S. 2021. The jPOST repository as a public data repository for shotgun proteomics. Methods Mol Biol 2259:309-322. [ Abstract]
- Zhang Z, Hernandez K, Savage J, Li S, Miller D, Agrawal S, Ortuno F, Staudt LM, Heath A, Grossman RL. 2021. Uniform genomic data analysis in the NCI Genomic Data Commons. Nat Commun 12(1):1226. [ Full Text]
- Bogue MA, Philip VM, Walton DO, Grubb SC, Dunn MH, Kolishovski G, Emerson J, Mukherjee G, Stearns T, He H, Sinha V, Kadakkuzha B, Kunde-Ramamoorthy G, Chesler EJ. 2020. Mouse phenome database: a data repository and analysis suite for curated primary mouse phenotype data. Nucleic Acids Res 48(D1):D716-D723. [ Full Text]
- Amid C, Pakseresht N, Silvester N, Jayathilaka S, Lund O, Dynovski LD, Pataki BA, Visontai D, Xavier BB, Alako BTF, Belka A, Cisneros JLB, Cotten M, Haringhuizen GB, Harrison PW, Hoper D, Holt S, Hundahl C, Hussein A, Kaas RS, Liu X, Leinonen R, Malhotra-Kumar S, Nieuwenhuijse DF, Rahman N, Dos S Ribeiro C, Skiby JE, Schmitz D, Steger J, Szalai-Gindl JM, Thomsen MCF, Caccio SM, Csabai I, Kroneman A, Koopmans M, Aarestrup F, Cochrane G. 2019. The COMPARE data hubs Database (Oxford). 2019(2019):baz136. [ Full Text]
- Banzi R, Canham S, Kuchinke W, Krleza-Jeric K, Demotes-Mainard J, Ohmann C. 2019. Evaluation of repositories for sharing individual-participant data from clinical studies. Trials 20(1):169. [ Abstract]
- Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, Christie C, Dalenberg K, Duarte JM, Dutta S, Feng Z, Ghosh S, Goodsell DS, Green RK, Guranovic V, Guzenko D, Hudson BP, Kalro T, Liang Y, Lowe R, Namkoong H, Peisach E, Periskova I, Prlic A, Randle C, Rose A, Rose P, Sala R, Sekharan M, Shao C, Tan L, Tao YP, Valasatava Y, Voigt M, Westbrook J, Woo J, Yang H, Young J, Zhuravleva M, Zardecki C. 2019. RCSB protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 47(D1):D464-D474. [ Full Text]
- Grossman RL. 2019. Data lakes, clouds, and commons: a review of platforms for analyzing and sharing genomic data. Trends Genet 35(3):223-234. [ Abstract]
- Laulederkind SJF, Hayman GT, Wang SJ, Hoffman MJ, Smith JR, Bolton ER, De Pons J, Tutaj MA, Tutaj M, Thota J, Dwinell MR, Shimoyama M. 2019. Rat genome databases, repositories, and tools. Methods Mol Biol 2018:71-96. [Abstract]
- Ma J, Chen T, Wu S, Yang C, Bai M, Shu K, Li K, Zhang G, Jin Z, He F, Hermjakob H, Zhu Y. 2019. iProX: an integrated proteome resource. Nucleic Acids Res 47(D1):D1211-D1217. [ Full Text]
- UniProt Consortium. 2019. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506-D515. [ Full Text]
- Saldanha IJ, Smith BT, Ntzani E, Jap J, Balk EM, Lau J. 2019. The Systematic Review Data Repository (SRDR): descriptive characteristics of publicly available data and opportunities for research. Syst Rev 8(1):334. [Abstract]
- Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. 2018. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 25(3):300-308. [ Abstract]
- Kleywegt GJ, Velankar S, Patwardhan A. 2018. Structural biology data archiving - where we are and what lies ahead. FEBS Lett 592(12):2153-2167. [ Abstract]
- Ohno-Machado L, Sansone SA, Alter G, Fore I, Grethe J, Xu H, Gonzalez-Beltran A, Rocca-Serra P, Gururaj AE, Bell E, Soysal E, Zong N, Kim HE. 2017. Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49(6):816-819. [ Abstract]
- Perez-Riverol Y, Bai M, da Veiga Leprevost F, Squizzato S, Park YM, Haug K, Carroll AJ, Spalding D, Paschall J, Wang M, Del-Toro N, Ternent T, Zhang P, Buso N, Bandeira N, Deutsch EW, Campbell DS, Beavis RC, Salek RM, Sarkans U, Petryszak R, Keays M, Fahy E, Sud M, Subramaniam S, Barbera A, Jimenez RC, Nesvizhskii AI, Sansone SA, Steinbeck C, Lopez R, Vizcaino JA, Ping P, Hermjakob H. 2017. Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol 35(5):406-409. [ Abstract]
Data Citations
- Ball A, Duke M. 2015. How to Cite Datasets and Link to Publications. Edinburgh: Digital Curation Centre.
-
DataCite – Cite Your Data
DataCite is a leading global non-profit organization that provides persistent identifiers (DOIs) for research data. Their goal is to help the research community locate, identify, and cite research data with confidence. This page contains best practices for citing data. Properly citing data gives scholarly credit to data producers and facilitates discovery and reuse of the dataset. -
Joint Declaration of Data Citation Principles – Final
First released in November 2013 and finalized in February 2014, the declaration describes eight principles that emphasize the importance of data as evidence, the need to give credit to data contributors, the idea that cited data requires unique and persistent identifiers, and the belief that data citation should allow for human and machine access to the data and support verification and interoperability. -
Making Your Code Citable
This GitHub tutorial guides researchers on how make their work shared on GitHub citable by archiving the repository and assigning a DOI with the data archiving tool Zenodo.
In addition, GitHub supports software citation based on the Citation File Format so researchers can easily be acknowledged for their contributions to software. By including a CITATION.cff file in their repository, a software citation widget will be added to the repository sidebar.
Relevant Publications
- Buneman P, Christie G, Davies JA, Dimitrellou R, Harding SD, Pawson AJ, Sharman JL, Wu Y. 2020. Why data citation isn't working, and what to do about it. Database (Oxford) 2020(2020):baaa022. [ Full Text]
- Katz DS, Chue Hong NP, Clark T, Muench A, Stall S, Bouquin D, Cannon M, Edmunds S, Faez T, Feeney P, Fenner M, Friedman M, Grenier G, Harrison M, Heber J, Leary A, MacCallum C, Murray H, Pastrana E, Perry K, Schuster D, Stockhause M, Yeston J. 2020. Recognizing the value of software: a software citation guide. F1000Res. 19(9):1257. [ Full Text]
- Fenner M, Crosas M, Grethe JS, Kennedy D, Hermjakob H, Rocca-Serra P, Durand G, Berjon R, Karcher S, Martone M, Clark T. 2019. A data citation roadmap for scholarly data repositories. Sci Data 6(1):28. [Abstract]
- Honor LB, Haselgrove C, Frazier JA, Kennedy DN. 2016. Data citation in neuroimaging: proposed best practices for data identification and attribution. Front Neuroinform 10:34. [ Abstract]
Research Data Management
-
Data Management Skillbuilding Hub
The Data Management Skillbuilding Hub is an open resource in GitHub created and managed by DataONE to help users better manage their data. In addition, DataOne has a Best Practices Primer for those new to data management. -
DMPTool
Build a data management plan using this open-access data management plan tool. -
Research Data Management Handbook
The Research Data Management Handbook is a primer on managing research data. It was created by OpenAIRE, a European organization working to shift scholarly communication toward openness and transparency. -
UK Data Service - Data Management Checklist
The UK Data Service is a great resource on good data management practices and includes a checklist for identifying data management and data sharing best practices.
Relevant Publications
- Gill IS, Griffiths EJ, Dooley D, Cameron R, Savić Kallesøe S, John NS, Sehar A, Gosal G, Alexander D, Chapel M, Croxen MA, Delisle B, Di Tullio R, Gaston D, Duggan A, Guthrie JL, Horsman M, Joshi E, Kearny L, Knox N, Lau L, LeBlanc JJ, Li V, Lyons P, MacKenzie K, McArthur AG, Panousis EM, Palmer J, Prystajecky N, Smith KN, Tanner J, Townend C, Tyler A, Van Domselaar G, Hsiao WWL. 2023. The DataHarmonizer: a tool for faster data harmonization, validation, aggregation and analysis of pathogen genomics contextual information. Microb Genom 9(1):000908. [Full Text]
- Zulfiqar M, Gadelha L, Steinbeck C, Sorokina M, Peters K. 2023. MAW: the reproducible Metabolome Annotation Workflow for untargeted tandem mass spectrometry. J Cheminform 15(1):32. [Full Text]
- Lauterbach S, Dienhart H, Range J, Malzacher S, Spöring JD, Rother D, Pinto MF, Martins P, Lagerman CE, Bommarius AS, Høst AV, Woodley JM, Ngubane S, Kudanga T, Bergmann FT, Rohwer JM, Iglezakis D, Weidemann A, Wittig U, Kettner C, Swainston N, Schnell S, Pleiss J. 2023. EnzymeML: seamless data flow and modeling of enzymatic data. Nat Methods 20(3):400-402. [Full Text]
- Barker M, Chue Hong NP, Katz DS, Lamprecht AL, Martinez-Ortiz C, Psomopoulos F, Harrow J, Castro LJ, Gruenpeter M, Martinez PA, Honeyman T. 2022. Introducing the FAIR Principles for research software. Sci Data 9(1):622. [Full Text]
- Child AW, Hinds J, Sheneman L, Buerki S. 2022. Centralized project-specific metadata platforms: toolkit provides new perspectives on open data management within multi-institution and multidisciplinary research projects. BMC Res Notes 15(1):106.
- Lätti ST, Niinivehmas S, Pentikäinen OT. 2021. Sdfconf: a novel, flexible, and robust molecular data management tool. J Chem Inf Model 62(1):9-15.
- Lamer A, Al Massati S, Saint-Dizier C, Fares E, Chazard E, Fruchart M. 2022. Data management for health data reuse: proposal of a standard workflow and a R tutorial with jupyter notebook. Stud Health Technol Inform 298:82-86. [Abstract]
- Pineda-Pampliega J, Bernhard A, Hannisdal R, Ørnsrud R, Mathisen GH, Solstad G, Rasinger JD. 2022. Developing a framework for open and FAIR data management practices for next generation risk- and benefit assessment of fish and seafood. EFSA J 20(Suppl 2):e200917. [Full Text]
- Roder T, Oberhänsli S, Shani N, Bruggmann R. 2022. OpenGenomeBrowser: a versatile, dataset-independent and scalable web platform for genome data management and comparative genomics. BMC Genomics 23(1):855. [Full Text]
- Sarramia D, Claude A, Ogereau F, Mezhoud J, Mailhot G. 2022. CEBA: a data lake for data sharing and environmental monitoring. Sensors (Basel) 22(7):2733.
- Van Bulck L, Wampers M, Moons P. 2022. Research Electronic Data Capture (REDCap): tackling data collection, management, storage, and privacy challenges. Eur J Cardiovasc Nurs 21(1):85-91.
- Yadav PK, Birla S, Baliga V, Liedl R, Chahar BR, Werth CJ. 2022. Contamination Assessment and Site-management Tool (CAST): a browser-based tool for site assessment. Ground Water 60(2):275-283.
- Bai J, Bandla C, Guo J, Vera Alvarez R, Bai M, Vizcaíno JA, Moreno P, Grüning B, Sallou O, Perez-Riverol Y. 2021. BioContainers Registry: Searching Bioinformatics and Proteomics Tools, Packages, and Containers. J Proteome Res. 20(4):2056-2061. [ Abstract]
- Coarfa C, Grimm SL, Rajapakshe K, Perera D, Lu HY, Wang X, Christensen KR, Mo Q, Edwards DP, Huang S. 2021. Reverse-phase protein array: technology, application, data processing, and integration. J Biomol Tech; doi:10.7171/jbt.2021-3202-001 [Online 15 January 2021]. [ Full Text]
- Paul-Gilloteaux P, Tosi S, Hériché JK, Gaignard A, Ménager H, Marée R, Baecker V, Klemm A, Kalaš M, Zhang C, Miura K, Colombelli J. 2021. Bioimage analysis workflows: community resources to navigate through a complex ecosystem. F1000Res 10:320. [ Full Text]
- Perez-Riverol Y, Moreno P. Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines. Proteomics. 2020 May;20(9):e1900147. [ Abstract]
- Miksa T, Simms S, Mietchen D, Jones S. 2019. Ten principles for machine-actionable data management plans. PLoS Comput Biol 15(3):e1006750. [ Abstract]
- Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC; BioContainers Community, Perez-Riverol Y. 2018. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 7:ISCB Comm J-742. [ Full Text]
- Schiermeier Q. 2018. Data management made simple. Nature 555(7696):403-405. [ Abstract]
- Michener WK. 2015. Ten simple rules for creating a good data management plan. PLoS Comput Biol 11(10):e1004525. [ Abstract]
Metadata Standards/Ontologies/Data Integration
-
Disciplinary Metadata
The Digital Curation Center allows one to search for metadata standards, extensions, tools, and use cases by discipline (biology, earth science, general research data, physical science, and social science and humanities). -
FAIRsharing Standards Database
The standards in FAIRsharing are manually curated from a variety of sources, including BioPortal, MIBBI, and the Equator Network. -
NCBO BioPortal
BioPortal is a comprehensive repository of biomedical ontologies developed by the National Center for Biomedical Ontology, an international consortium providing ontological resources for the biomedical research community. BioPortal allows the user to browse or search ontologies, get ontology recommendations, explore mappings between ontology terms, and annotate textual biomedical data with ontology terms. The NIEHS Children’s Health Exposure Analysis Resource Ontology can be found in BioPortal. -
NIH Common Data Elements (CDE) Repository
The NIH CDE Repository provides access to structured human- and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and other purposes. -
Open Biological and Biomedical Ontology (OBO) Foundry
The OBO Foundry is a collective of ontology developers with the mission to develop a family of interoperable science-based ontologies for shared use across different biological and medical domains. They have published a set of normative principles for OBO Foundry ontologies. -
PhenXToolkit (consensus measures for Phenotypes and eXposures)
The PhenX Toolkit is a Web-based catalog of recommended, standard measures for phenotypes and exposures for use in biomedical research. Using protocols from the PhenX Toolkit allows investigators who are studying different diseases and conditions to collect data using the same methodologies - thus facilitating cross-study analysis.
Relevant Publications
- Wishart DS, Girod S, Peters H, Oler E, Jovel J, Budinski Z, Milford R, Lui VW, Sayeeda Z, Mah R, Wei W, Badran H, Lo E, Yamamoto M, Djoumbou-Feunang Y, Karu N, Gautam V. 2023. ChemFOnt: the chemical functional ontology resource. Nucleic Acids Res 51(D1):D1220-D1229. [Full Text]
- Crystal-Ornelas R, Varadharajan C, O'Ryan D, Beilsmith K, Bond-Lamberty B, Boye K, Burrus M, Cholia S, Christianson DS, Crow M, Damerow J, Ely KS, Goldman AE, Heinz SL, Hendrix VC, Kakalia Z, Mathes K, O'Brien F, Pennington SC, Robles E, Rogers A, Simmonds M, Velliquette T, Weisenhorn P, Welch JN, Whitenack K, Agarwal DA. 2022. Enabling FAIR data in Earth and environmental science with community-centric (meta)data reporting formats. Sci Data 9(1):700. [Full Text]
- Feric Z, Bohm Agostini N, Beene D, Signes-Pastor AJ, Halchenko Y, Watkins D, MacKenzie D, Karagas M, Manjourides J, Alshawabkeh A, Kaeli D. 2021. A secure and reusable software architecture for supporting online data harmonization. Proc IEEE Int Conf Big Data 2801-2812. [Full Text]
- Hu J, Zhong Y, Shang X. 2022. A versatile and scalable single-cell data integration algorithm based on domain-adversarial and variational approximation. Brief Bioinform 23(1):bbab400. [Abstract]
- Ives C, Pan H, Edwards SW, Nelms M, Covert H, Lichtveld MY, Harville EW, Wickliffe JK, Zijlmans W, Hamilton CM. 2022. Linking complex disease and exposure data-insights from an environmental and occupational health study. J Expo Sci Environ Epidemiol; doi: 10.1038/s41370-022-00428-7 [Online 28 March 2022]. [Full Text]
- Kang M, Ko E, Mersha TB. 2022. A roadmap for multi-omics data integration using deep learning. Brief Bioinform 23(1):bbab454. [Abstract]
- Lu L, Welch JD. 2022. PyLiger: scalable single-cell multi-omic data integration in Python. Bioinformatics; doi: 10.1093/bioinformatics/btac190 [Online 31 March 2022]. [Abstract]
- Martens M, Evelo CT, Willighagen EL. 2022. Providing adverse outcome pathways from the AOP-Wiki in a semantic web format to increase usability and accessibility of the content. Appl In Vitro Toxicol 8(1):2-13. [Full Text]
- Nguyen T, Walczak N, Sumorok D, Weston M, Beal J. 2022. Intent Parser: a tool for codification and sharing of experimental design. ACS Synth Biol 11(1):502-507. [Abstract]
- Pallotta S, Cascianelli S, Masseroli M. 2022. RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor. BMC Bioinformatics 23(123). [Full Text]
- Sabot F. 2022. On the importance of metadata when sharing and opening data. BMC Genom Data 23(1):79. [Full Text]
- Shao C, Feng Z, Westbrook JD, Peisach E, Berrisford J, Ikegawa Y, Kurisu G, Velankar S, Burley SK, Young JY. 2021. Modernized uniform representation of carbohydrate molecules in the Protein Data Bank. Glycobiology 31(9):1204-1218. [Full Text]
- Vahabi N, Michailidis G. 2022. Unsupervised multi-omics data integration methods: a comprehensive review. Front Genet 13:854752. [Full Text]
- Yang Y, Tian S, Qiu Y, Zhao P, Zou Q. 2022. MDICC: novel method for multi-omics data integration and cancer subtype identification. Brief Bioinform; doi: 10.1093/bib/bbac132 [Online 18 April 2022]. [Abstract]
- Abrams MB, Bjaalie JG, Das S, Egan GF, Ghosh SS, Goscinski WJ, Grethe JS, Kotaleski JH, Ho ETW, Kennedy DN, Lanyon LJ, Leergaard TB, Mayberg HS, Milanesi L, Moucek R, Poline JB, Roy PK, Strother SC, Tang TB, Tiesinga P, Wachtler T, Wojcik DK, Martone ME. 2021. A standards organization for open and FAIR neuroscience: the international neuroinformatics coordinating facility. Neuroinformatics; doi:10.1007/s12021-020-09509-0 [Online 27 January 2021]. [Abstract]
- Beaulieu-Jones B, Darabos C, Kim D, Verma A, Kobren SN. 2021. Innovative methodological approaches for data integration to derive patterns across diverse, large-scale biomedical datasets. Pac Symp Biocomput 26:256-260. [Full Text]
- Blask K, Gerhards L, Jalynskij M. PsyCuraDat: Designing a User-Oriented Curation Standard for Behavioral Psychological Research Data. Front Psychol. 2021 Jan 12;11:579397. [ Full Text]
- Boughton AP, Welch RP, Flickinger M, VandeHaar P, Taliun D, Abecasis GR, Boehnke M. 2021. LocusZoom.js: Interactive and embeddable visualization of genetic association study results. Bioinformatics; doi:10.1093/bioinformatics/btab186 [Online 17 March 2021]. [ Full Text]
- Cantini L, Zakeri P, Hernandez C, Naldi A, Thieffry D, Remy E, Baudot A. 2021. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 12(1):124. [ Full Text]
- Chan L, Vasilevsky N, Thessen A, McMurry J, Haendel M. The landscape of nutri-informatics: a review of current resources and challenges for integrative nutrition research. Database (Oxford). 2021 Jan 25;2021:baab003. [Full Text]
- Chiu W, Schmid MF, Pintilie G, Lawson CL. 2021. Evolution of standardization and dissemination of cryo-EM structures and data jointly by the community, PDB and EMDB. J Biol Chem; doi:10.1016/j.jbc.2021.100560 [Online 17 March 2021]. [Full Text]
- Dimitrova M, Meyer R, Buttigieg PL, Georgiev T, Zhelezov G, Demirov S, Smith V, Penev L. 2021. A streamlined workflow for conversion, peer review, and publication of genomics metadata as omics data papers. Gigascience 10(5):giab034. [ Full Text]
- Halford JJ, Clunie DA, Brinkmann BH, Krefting D, Remi J, Rosenow F, Husain A, Furbass F, Andrew Ehrenberg J, Winkler S. 2021. Standardization of neurophysiology signal data into the DICOM® standard. Clin Neurophysiol; doi:10.1016/j.clinph.2021.01.019 [Online 20 February 2021]. [ Abstract]
- Hedin F, Konstantinou M, Cosma A. 2021. Data integration and visualization techniques for post-cytometric analysis of complex datasets. Cytometry A; doi:10.1002/cyto.a.24359 [Online 6 May 2021]. [Abstract]
- Hong S, Liow CH, Yuk JM, Byon HR, Yang Y, Cho E, Yeom J, Park G, Kang H, Kim S, Shim Y, Na M, Jeong C, Hwang G, Kim H, Kim H, Eom S, Cho S, Jun H, Lee Y, Baucour A, Bang K, Kim M, Yun S, Ryu J, Han Y, Jetybayeva A, Choi PP, Agar JC, Kalinin SV, Voorhees PW, Littlewood P, Lee HM. Reducing Time to Discovery: Materials and Molecular Modeling, Imaging, Informatics, and Integration. ACS Nano. 2021 Feb 12. [ Abstract]
- Huguet J, Falcon C, Fuste D, Girona S, Vicente D, Molinuevo JL, Gispert JD, Operto G, ALFA Study. 2021. Management and quality control of large neuroimaging datasets: developments from the Barcelonaβeta Brain Research Center. Front Neurosci 15:633438. [ Full Text]
- Ison J, Ienasescu H, Rydza E, Chmura P, Rapacki K, Gaignard A, Schwammle V, van Helden J, Kalas M, Menager H. 2021. biotoolsSchema: a formalized schema for bioinformatics software description. Gigascience 10(1):giaa157. [ Full Text]
- Kamdar MR, Musen MA. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data. 2021 Jan 21;8(1):24. [ Full Text]
- Liang X, Akers K, Keenum I, Wind L, Gupta S, Chen C, Aldaihani R, Pruden A, Zhang L, Knowlton KF, Xia K, Heath LS. 2021. AgroSeek: a system for computational analysis of environmental metagenomic data and associated metadata. BMC Bioinformatics 22(1):117. [ Full Text]
- Loffler F, Wesp V, Konig-Ries B, Klan F. 2021. Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs? PLoS One 16(3):e0246099. [ Full Text]
- Lung PY, Zhong D, Pang X, Li Y, Zhang J. Maximizing the reusability of gene expression data by predicting missing metadata. PLoS Comput Biol. 2020 Nov 6;16(11):e1007450. [ Full Text]
- Marcon Y, Bishop T, Avraam D, Escriba-Montagut X, Ryser-Welch P, Wheater S, Burton P, Gonzalez JR. 2021. Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD. PLoS Comput Biol 17(3):e1008880. [ Full Text]
- Martinez-Costa C, Abad-Navarro F. 2021. Towards a semantic data harmonization federated infrastructure. Stud Health Technol Inform 281:38-42. [ Full Text]
- Mate S, Seuchter SA, Ehrenberg K, Deppenwiese N, Zierk J, Prokosch HU, Kraska D, Kapsner LA. 2021. A multi-user terminology mapping toolbox. Stud Health Technol Inform 278:217-223. [ Full Text]
- Palafox MF, Desai HS, Arboleda VA, Backus KM. 2021. From chemoproteomic-detected amino acids to genomic coordinates: insights into precise multi-omic data integration. Mol Syst Biol 17(2):e9840. [Full Text]
- Palmer RHC, Johnson EC, Won H, Polimanti R, Kapoor M, Chitre A, Bogue MA, Benca-Bachman CE, Parker CC, Ursu O, Verma A, Reynolds T, Ernst J, Bray M, Kwon SB, Lai D, Quach BC, Gaddis NC, Saba L, Chen H, Hawrylycz M, Zhang S, Zhou Y, Mahaffey S, Fischer C, Sanchez-Roige S, Bandrowski A, Qing L, Shen L, Philip V, Gelernter J, Bierut LJ, Hancock DB, Edenberg HJ, Johnson EO, Nestler EJ, Barr PB, Prins P, Smith DJ, Akbarian S, Thorgeirsson T, Walton D, Baker E, Jacobson D, Palmer AA, Miles M, Chesler EJ, Emerson J, Agrawal A, Martone M, Williams RW. 2021. Integration of evidence across human and model organism studies: a meeting report. Genes Brain Behav; doi:10.1111/gbb.12738 [Online 23 April 2021]. [ Full Text]
- Planell N, Lagani V, Sebastian-Leon P, van der Kloet F, Ewing E, Karathanasis N, Urdangarin A, Arozarena I, Jagodic M, Tsamardinos I, Tarazona S, Conesa A, Tegner J, Gomez-Cabrero D. 2012. STATegra: Multi-omics data integration - a conceptual scheme with a bioinformatics pipeline. Front Genet 12:620453. [ Full Text]
- Race AM, Sutton D, Hamm G, Maglennon G, Morton JP, Strittmatter N, Campbell A, Sansom OJ, Wang Y, Barry ST, Takáts Z, Goodwin RJA, Bunch J. Deep Learning-Based Annotation Transfer between Molecular Imaging Modalities: An Automated Workflow for Multimodal Data Integration. Anal Chem. 2021 Feb 16;93(6):3061-3071. [ Abstract]
- Reich NG, Cornell M, Ray EL, House K, Le K. 2021. The Zoltar forecast archive, a tool to standardize and store interdisciplinary prediction research. Sci Data 8(1):59. [ Full Text]
- Savoska S, Fdez-Arroyabe P, Cifra M, Kourtidis K, Rozanov E, Nicoll K, Dragovic S, Mir LM. 2021. Toward the creation of an ontology for the coupling of atmospheric electricity with biological systems. Int J Biometeorol 65(1):31-44. [ Abstract]
- Stilp AM, Emery LS, Broome JG, Buth EJ, Khan AT, Laurie CA, Wang FF, Wong Q, Chen D, D'Augustine CM, Heard-Costa NL, Hohensee CR, Johnson WC, Juarez LD, Liu J, Mutalik KM, Raffield LM, Wiggins KL, de Vries PS, Kelly TN, Kooperberg C, Natarajan P, Peloso GM, Peyser PA, Reiner AP, Arnett DK, Aslibekyan S, Barnes KC, Bielak LF, Bis JC, Cade BE, Chen MH, Correa A, Cupples LA, de Andrade M, Ellinor PT, Fornage M, Franceschini N, Gan W, Ganesh SK, Graffelman J, Grove ML, Guo X, Hawley NL, Hsu WL, Jackson RD, Jaquish CE, Johnson AD, Kardia SLR, Kelly S, Lee J, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Morrison AC, North KE, Nouraie SM, Oelsner EC, Pankratz N, Rich SS, Rotter JI, Smith JA, Taylor KD, Vasan RS, Weeks DE, Weiss ST, Wilson CG, Yanek LR, Psaty BM, Heckbert SR, Laurie CC. 2021. A system for phenotype harmonization in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Am J Epidemiol; doi:10.1093/aje/kwab115 [Online 16 April 2021]. [ Full Text]
- Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. BioHackathon 2015: Semantics of data for life sciences and reproducible research. F1000Res. 2020 Feb 24;9:136. [ Full Text]
- Votava JA, Parks BW. 2021. Cross-species data integration to prioritize causal genes in lipid metabolism. Curr Opin Lipidol 32(2):141-146. [ Abstract]
- Wen Y, Song X, Yan B, Yang X, Wu L, Leng D, He S, Bo X. 2021. Multi-dimensional data integration algorithm based on random walk with restart. BMC Bioinformatics 22(1):97. [ Full Text]
- Zanfardino M, Castaldo R, Pane K, Affinito O, Aiello M, Salvatore M, Franzese M. 2021. MuSA: a graphical user interface for multi-OMICs data integration in radiogenomic studies. Sci Rep 11(1):1550. [Full Text]
- Alter G, Gonzalez-Beltran A, Ohno-Machado L, Rocca-Serra P. 2020. The Data Tags Suite (DATS) model for discovering data access and use requirements. Gigascience 9(2). [ Abstract]
- Bernasconi A, Canakoglu A, Masseroli M, Ceri S. 2020. The road towards data integration in human genomics: players, steps and interactions. Brief Bioinform; doi:10.1093/bib/bbaa080 [Online 4 June 2020]. [Abstract]
- Bernstein MN, Gladstein A, Latt KZ, Clough E, Busby B, Dillman A. 2020. Jupyter notebook-based tools for building structured datasets from the Sequence Read Archive. F1000Res 9:376. [ Full Text]
- Canzler S, Schor J, Busch W, Schubert K, Rolle-Kampczyk UE, Seitz H, Kamp H, von Bergen M, Buesen R, Hackermuller J. 2020. Prospects and challenges of multi-omics data integration in toxicology. Arch Toxicol; doi:10.1007/s00204-020-02656-y [Online 8 February 2020]. [Abstract]
- Christley S, Aguiar A, Blanck G, Breden F, Bukhari SAC, Busse CE, Jaglale J, Harikrishnan SL, Laserson U, Peters B, Rocha A, Schramm CA, Taylor S, Vander Heiden JA, Zimonja B, Watson CT, Corrie B, Cowell LG. 2020. The ADC API: a web API for the programmatic query of the AIRR Data Commons. Front Big Data 3:22. [ Full Text]
- Elghafari A, Finkelstein J. 2020. Introducing an ontology-driven pipeline for the identification of common data elements. Stud Health Technol Inform 272:379-382. [ Full Text]
- Graw S, Chappell K, Washam CL, Gies A, Bird J, Robeson MS 2nd, Byrum SD. 2020. Multi-omics data integration considerations and study design for biological systems and disease. Mol Omics; doi: 10.1039/d0mo00041h [Online 21 December 2020]. [Full Text]
- Hollmann S, Kremer A, Baebler S, Trefois C, Gruden K, Rudnicki WR, Tong W, Gruca A, Bongcam-Rudloff E, Evelo CT, Nechyporenko A, Frohme M, Safranek D, Regierer B, D'Elia D. 2020. The need for standardisation in life science research - an approach to excellence and trust. F1000Res 9:1398. [ Full Text]
- Konopka T, Smedley D. 2020. Incremental data integration for tracking genotype-disease associations. PLoS Comput Biol 16(1):e1007586. [ Abstract]
- Li J, Yin Y, Zhang M, Cui J, Zhang Z, Zhang Z, Sun D. 2020. GsmPlot: a web server to visualize epigenome data in NCBI. BMC Bioinformatics 21(1):55. [ Abstract]
- Meyer DE, Bailin SC, Vallero D, Egeghy PP, Liu SV, Cohen Hubal EA. 2020. Enhancing life cycle chemical exposure assessment through ontology modeling. Sci Total Environ 712:136263. [ Abstract]
- Odenkirk MT, Zin PPK, Ash JR, Reif DM, Fourches D, Baker ES. 2020. Structural-based connectivity and omic phenotype evaluations (SCOPE): a cheminformatics toolbox for investigating lipidomic changes in complex systems. Analyst 145(22):7197-7209. [ Full Text]
- Reid RW, Ferrier JW, Jay JJ. 2020. Automated gene data integration with Databio. BMC Res Notes 13(1):195. [ Full Text]
- Subramanian I, Verma S, Kumar S, Jere A, Anamika K. 2020. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14:1177932219899051. [ Full Text]
- Thessen AE, Grondin CJ, Kulkarni RD, Brander S, Truong L, Vasilevsky NA, Callahan TJ, Chan LE, Westra B, Willis M, Rothenberg SE, Jarabek AM, Burgoon L, Korrick SA, Haendel MA. 2020. Community approaches for integrating environmental exposures into human models of disease. Environ Health Perspect 128(12):125002. [ Full Text]
- Waltemath D, Golebiewski M, Blinov ML, Gleeson P, Hermjakob H, Hucka M, Inau ET, Keating SM, Konig M, Krebs O, Malik-Sheriff RS, Nickerson D, Oberortner E, Sauro HM, Schreiber F, Smith L, Stefan MI, Wittig U, Myers CJ. 2020. The first 10 years of the international coordination network for standards in systems and synthetic biology (COMBINE). J Integr Bioinform 17(2-3):20200005. [ Full Text]
- Brown J, Phillips AR, Lewis DA, Mans MA, Chang Y, Tanguay RL, Peterson ES, Waters KM, Tilton SC. 2019. Bioinformatics Resource Manager: a systems biology web tool for microRNA and omics data integration. BMC Bioinformatics 20(1):255. [ Abstract]
- Bucher E, Claunch CJ, Hee D, Smith RL, Devlin K, Thompson W, Korkola JE, Heiser LM. 2019. Annot: a Django-based sample, reagent, and experiment metadata tracking system. BMC Bioinformatics 20(1):542. [Abstract]
- Buendia P, Bradley RM, Taylor TJ, Schymanski EL, Patti GJ, Kabuka MR. 2019. Ontology-based metabolomics data integration with quality control. Bioanalysis 11(12):1139-1155. [ Abstract]
- Cooper DJ, Schurer S. 2019. Improving the utility of the Tox21 dataset by deep metadata annotations and constructing reusable benchmarked chemical reference signatures. Molecules 24(8):1604. [ Abstract]
- Dorea FC, Vial F, Hammar K, Lindberg A, Lambrix P, Blomqvist E, Revie CW. 2019. Drivers for the development of an Animal Health Surveillance Ontology (AHSO). Prev Vet Med 166:39-48. [ Abstract]
- Falster DS, FitzJohn RG, Pennell MW, Cornwell WK. 2019. Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R. Gigascience 8(5). [ Abstract]
- Fillinger S, de la Garza L, Peltzer A, Kohlbacher O, Nahnsen S. 2019. Challenges of big data integration in the life sciences. Anal Bioanal Chem 411(26):6791-6800. [ Abstract]
- Kourou KD, Pezoulas VC, Georga EI, Exarchos TP, Tsanakas P, Tsiknakis M, Varvarigou T, De Vita S, Tzioufas A, Fotiadis DI. 2019. Cohort harmonization and integrative analysis from a biomedical engineering perspective. IEEE Rev Biomed Eng 12:303-318. [ Abstract]
- Macklin P. 2019. Key challenges facing data-driven multicellular systems biology. Gigascience 8(10). [ Abstract]
- Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. 2019. Machine learning and integrative analysis of biomedical big data. Genes (Basel) 10(2):E87. [ Abstract]
- Pala D, Pagan J, Parimbelli E, Rocca MT, Bellazzi R, Casella V. 2019. Spatial enablement to support environmental, demographic, socioeconomics, and health data integration and analysis for big cities: a case study with asthma hospitalizations in New York City. Front Med 6:84. [Abstract]
- Peng C, Goswami P. 2019. Meaningful integration of data from heterogeneous health services and home environment based on ontology. Sensors (Basel) 19(8):E1747. [ Abstract]
- Schymanski EL, Baker NC, Williams AJ, Singh RR, Trezzi JP, Wilmes P, Kolber PL, Kruger R, Paczia N, Linster CL, Balling R. 2019. Connecting environmental exposure and neurodegeneration using cheminformatics and high resolution mass spectrometry: potential and challenges. Environ Sci Process Impacts 21(9):1426-1445. [ Abstract]
- Siegele DA, LaBonte SA, Wu PI, Chibucos MC, Nandendla S, Giglio MG, Hu JC. 2019. Phenotype annotation with the Ontology of Microbial Phenotypes (OMP). J Biomed Semantics 10(1):13. [ Abstract]
- Sima AC, Stockinger K, de Farias TM, Gil M. 2019. Semantic integration and enrichment of heterogeneous biological databases. In: Evolutionary Genomics (Anisimova M, ed.). New York, NY: Humana. [ Full Text]
- Tang YA, Pichler K, Fullgrabe A, Lomax J, Malone J, Munoz-Torres MC, Vasant DV, Williams E, Haendel M. 2019. Ten quick tips for biocuration. PLoS Comput Biol 15(5):e1006906. [ Abstract]
- T'Joen V, Vaneeckhaute L, Priem S, Van Woensel S, Bekaert S, Berneel E, Van Der Straeten C. 2019. Rationalized development of a campus-wide cell line dataset for implementation in the Biobank LIMS system at Bioresource Center Ghent. Front Med (Lausanne) 6:137. [ Abstract]
- Wang RL, Edwards S, Ives C. 2019. Ontology-based semantic mapping of chemical toxicities. Toxicology 412:89-100. [ Abstract]
- Abburu S. 2018. Ontology driven cross-linked domain data integration and spatial semantic multi criteria query system for geospatial public health. Int J Semant Web Inf Syst 14(3):1-30. [ Abstract]
- Baker N, Boobis A, Burgoon L, Carney E, Currie R, Fritsche E, Knudsen T, Laffont M, Piersma AH, Poole A, Schneider S, Daston G. 2018. Building a developmental toxicity ontology. Birth Defects Res 110(6):502-518. [Abstract] [ ECETOC Open Access Report]
- Cooper L, Meier A, Laporte MA, Elser JL, Mungall C, Sinn BT, Cavaliere D, Carbon S, Dunn NA, Smith B, Qu B, Preece J, Zhang E, Todorovic S, Gkoutos G, Doonan JH, Stevenson DW, Arnaud E, Jaiswal P. 2018. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res 46(D1):D1168-D1180. [ Abstract]
- Fairchild G, Tasseff B, Khalsa H, Generous N, Daughton AR, Velappan N, Priedhorsky R, Deshpande A. 2018. Epidemiological data challenges: planning for a more robust future through data standards. Front Public Health 6:336. [Abstract]
- Haynes D, Jokela A, Manson S. 2018. IPUMS-Terra: integrated big heterogeneous spatio-temporal data analysis system. J Geogr Syst 20(4):343-361. [ Abstract]
- He Y, Xiang Z, Zheng J, Lin Y, Overton JA, Ong E. 2018. The eXtensible Ontology Development (XOD) principles and tool implementation to support ontology interoperability. J Biomed Semantics 9(1):3. [Abstract]
- Huser V, Amos L. 2018. Analyzing real-world use of research common data elements. AMIA Annu Symp Proc 2018:602-608. [ Abstract]
- National Academies of Sciences, Engineering, and Medicine. 2018. Informing Environmental Health Decisions Through Data Integration: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. [Full Text]
- Alshahrani M, Khan MA, Maddouri O, Kinjo AR, Queralt-Rosinach N, Hoehndorf R. 2017. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 33(17):2723-2730. [ Abstract]
- Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. 2017. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res 45(D1):D972-D978 [Abstract]
- Malinowski AK, Ananth CV, Catalano P, Hines EP, Kirby RS, Klebanoff MA, Mulvihill JJ, Simhan H, Hamilton CM, Hendershot TP, Phillips MJ, Kilpatrick LA, Maiese DR, Ramos EM, Wright RJ, Dolan SM; PhenX Pregnancy Working Group. 2017. Research standardization tools: pregnancy measures in the PhenX Toolkit. Am J Obstet Gynecol 217(3):249-262. [ Abstract]
Data Science Training
-
Big Data to Knowledge (BD2K)
The BD2K Centers produced training and educational resources, including workshops, courses, webinars, lecture series, summer internships, and training programs. -
ERUDITE (Educational Resource Discovery Index)
Use ERUDITE to find educational resources (e.g., free modules, MOOCs, curricula, and webinars) and in-person or minimal-cost training opportunities (e.g., short courses) for data science. -
Edison Data Science Framework
The Edison Data Science Framework is a collection of documents that define the data science profession and competencies. -
FOSTER
The FOSTER portal is an e-learning platform providing training resources for those who need to know more about Open Science, or need to develop strategies and skills for implementing Open Science practices in their daily workflows. Available courses on Open Science include managing and sharing research data, best practices in open research, Open Science Software and workflows, data protection and ethics, and open licensing. -
The Carpentries
The Carpentries teach foundational coding, data science and computational skills to researchers worldwide. The Carpentries develop and teach in-person, interactive, two-day workshops using open-source lessons available on GitHub.
Relevant Publications
- Bittremieux W, Bouyssie D, Dorfer V, Locard-Paulet M, Perez-Riverol Y, Schwammle V, Uszkoreit J, Van Den Bossche T. 2021. The European Bioinformatics Community for Mass Spectrometry (EuBIC-MS): An open community for bioinformatics training and research. Rapid Commun Mass Spectrom; doi:10.1002/rcm.9087 [Online 16 April 2021]. [Abstract]
- Dill-McFarland KA, Konig SG, Mazel F, Oliver DC, McEwen LM, Hong KY, Hallam SJ. 2021. An integrated, modular approach to data science education in microbiology. PLoS Comput Biol 17(2):e1008661. [ Full Text]
- Attwood TK, Blackford S, Brazas MD, Davies A, Schneider MV. 2019. A global perspective on evolving bioinformatics and data science training needs. Brief Bioinform 20(2):398-404. [ Abstract]
- Grabowski P, Rappsilber J. 2019. A primer on data analytics in functional genomics: how to move from data to insight? Trends Biochem Sci 44(1):21-32. [ Abstract]
- Mendez KM, Pritchard L, Reinke SN, Broadhurst DI. 2019. Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing. Metabolomics 15(10):125. [ Full Text]
- Carey MA, Papin JA. 2018. Ten simple rules for biologists learning to program. PLoS Comput Biol 14(1):e1005871. [ Abstract]
- Huppenkothen D, Arendt A, Hogg DW, Ram K, VanderPlas JT, Rokem A. 2018. Hack weeks as a model for data science education and collaboration. Proc Natl Acad Sci U S A 115(36):8872-8877. [ Full Text]
- National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. [ Full Text]
- Toelch U, Ostwald D. 2018. Digital open science–teaching digital tools for reproducible and transparent research. PLoS Biol 16(7):e2006022. [ Abstract]
- Van Horn JD, Fierro L, Kamdar J, Gordon J, Stewart C, Bhattrai A, Abe S, Lei X, O'Driscoll C, Sinha A, Jain P, Burns G, Lerman K, Ambite JL. 2018. Democratizing data science through data science training. Pac Symp Biocomput 23:292-303. [ Abstract]
- Dunn MC, Bourne PE. 2017. Building the biomedical data science workforce. PLoS Biol 15(7):e2003082. [ Abstract]