Skip Navigation
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Your Environment. Your Health.

Publication Detail

Title: Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical Standards.

Authors: Panagopoulos Abrahamsson, Dimitri; Park, June-Soo; Singh, Randolph R; Sirota, Marina; Woodruff, Tracey J

Published In J Chem Inf Model, (2020 06 22)

Abstract: Non-targeted analysis provides a comprehensive approach to analyze environmental and biological samples for nearly all chemicals present. One of the main shortcomings of current analytical methods and workflows is that they are unable to provide any quantitative information constituting an important obstacle in understanding environmental fate and human exposure. Herein, we present an in silico quantification method using mahine-learning for chemicals analyzed using electrospray ionization (ESI). We considered three data sets from different instrumental setups: (i) capillary electrophoresis electrospray ionization-mass spectrometry (CE-MS) in positive ionization mode (ESI+), (ii) liquid chromatography quadrupole time-of-flight mass spectrometry (LC-QTOF/MS) in ESI+ and (iii) LC-QTOF/MS in negative ionization mode (ESI-). We developed and applied two different machine-learning algorithms: a random forest (RF) and an artificial neural network (ANN) to predict the relative response factors (RRFs) of different chemicals based on their physicochemical properties. Chemical concentrations can then be calculated by dividing the measured abundance of a chemical, as peak area or peak height, by its corresponding RRF. We evaluated our models and tested their predictive power using 5-fold cross-validation (CV) and y randomization. Both the RF and the ANN models showed great promise in predicting RRFs. However, the accuracy of the predictions was dependent on the data set composition and the experimental setup. For the CE-MS ESI+ data set, the best model predicted measured RRFs with a mean absolute error (MAE) of 0.19 log units and a cross-validation coefficient of determination (Q2) of 0.84 for the testing set. For the LC-QTOF/MS ESI+ data set, the best model predicted measured RRFs with an MAE of 0.32 and a Q2 of 0.40. For the LC-QTOF/MS ESI- data set, the best model predicted measured RRFs with a MAE of 0.50 and a Q2 of 0.20. Our findings suggest that machine-learning algorithms can be used for predicting concentrations of nontargeted chemicals with reasonable uncertainties, especially in ESI+, while the application on ESI- remains a more challenging problem.

PubMed ID: 32379974 Exiting the NIEHS site

MeSH Terms: No MeSH terms associated with this publication

to Top