Export to Word (http://www.niehs.nih.gov//portfolio/index.cfm?do=portfolio.grantdetail&&grant_number=R44ES031038&format=word)
Principal Investigator: Ekins, Sean
Institute Receiving Award Collaborations Pharmaceuticals, Inc.
Location Fuquay Varina, NC
Grant Number R44ES031038
Funding Organization National Institute of Environmental Health Sciences
Award Funding Period 01 Sep 2019 to 31 Jul 2025
DESCRIPTION (provided by applicant): Project Summary Computational toxicology aims to use rules, models and algorithms based on prior data for specific endpoints, to enable the prediction of whether a new molecule will possess similar liabilities or not. In some cases, the computational models are derived from discrete molecular endpoints (e.g. estrogen receptor agonism) while in others they are quite broad in scope (e.g. drug induced liver injury, DILI). Considerable progress has been made in computational toxicology in a decade both in model development and availability such that the latest generation of larger scale machine learning (ML) models will further focus in vitro and in vivo testing on verification of select predictions. Pharmaceutical, consumer products, agrochemical and other chemistry focused companies possess structure-activity data generated over many decades of screening that is not in the public domain, and this data is primarily only accessible to the cheminformatics experts in each company. Outside of these companies small pharmaceutical, biotech companies and academics must rely on data from public databases, commercial databases and their own data. Integrating such data from diverse sources and processing with algorithms to build machine learning (ML) models that can help to enable predictions for new compounds is a vast undertaking. Over Phase I of this project to develop the prototype for MegaToxĂ’, we curated toxicity datasets then generated and tested well over 200 ML models initially focused on the Bayesian approach. We have also developed approaches to understand training and test set applicability and ultimately performed prospective predictions against several toxicity targets. Having completed these aims, we also collaborated with numerous academic laboratories and performed fee-for-service work with five commercial companies. We currently have several pharmaceutical, agrochemical and consumer product companies evaluating our computational toxicity models prior to licensing. These discussions with potential customers have influenced this Phase II proposal to include the following aims: 1. Compare and integrate novel graph-based models such as graphSAGE versus our suite of 15 different ML regression and classification algorithms for modeling toxicology datasets such as those generated in Phase I. 2. Integrate read across and adverse outcome pathway methods with our computational models for DILI and other toxicity models as needed. 3. Generate validated ML models from in vivo data for non-mammalian species (initially using Zebrafish) which will enable in vitro and in vivo correlations and can be validated relatively cost effectively. In this proposal over 2 years we expect to develop models with 15 different algorithms for at least 100 in vitro and in vivo datasets, leading to > 1500 toxicity ML models. We are not aware of any other company pursuing such an approach to both generate new high value datasets or models, performing testing of their own models and creating a wide array of toxicity ML models. MegaToxĂ’ will be a product available for licensing by pharmaceutical, consumer product, agrochemical and regulatory groups as well as used in fee-for-service consulting.
Science Code(s)/Area of Science(s) Primary: 75 - Computational Biology/Computational Methods for Exposure Assessment
Secondary: 03 - Carcinogenesis/Cell Transformation
Publications See publications associated with this Grant.
Program Officer Lingamanaidu Ravichandran