Publication Detail

Title: PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods.

Authors: Romano, Joseph D; Le, Trang T; La Cava, William; Gregg, John T; Goldberg, Daniel J; Chakraborty, Praneel; Ray, Natasha L; Himmelstein, Daniel; Fu, Weixuan; Moore, Jason H

Published In Bioinformatics, (2022 Jan 12)

Abstract: MOTIVATION: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. RESULTS: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. AVAILABILITY: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.

PubMed ID: 34677586 Exiting the NIEHS site

MeSH Terms: Benchmarking*; Machine Learning; Models, Statistical; Software*

National Institute of Environmental Health Sciences

Webcasts

Your Environment. Your Health.

Publication Detail