Title: Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery.
Authors: Lane, Thomas R; Foil, Daniel H; Minerali, Eni; Urbina, Fabio; Zorn, Kimberley M; Ekins, Sean
Published In Mol Pharm, (2021 01 04)
Abstract: Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies, we and others have applied multiple machine learning algorithms and modeling metrics and, in some cases, compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and in comparison of our proprietary software Assay Central with random forest, k-nearest neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (three layers). Model performance was assessed using an array of fivefold cross-validation metrics including area-under-the-curve, F1 score, Cohen's kappa, and Matthews correlation coefficient. Based on ranked normalized scores for the metrics or datasets, all methods appeared comparable, while the distance from the top indicated that Assay Central and support vector classification were comparable. Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case. If anything, Assay Central may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central performance, although support vector classification seems to be a strong competitor. We also applied Assay Central to perform prospective predictions for the toxicity targets PXR and hERG to further validate these models. This work appears to be the largest scale comparison of these machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors, and machine learning algorithms and further refine the methods for evaluating and comparing such models.
PubMed ID: 33325717
MeSH Terms: No MeSH terms associated with this publication