Superfund Research Program
Machine Learning Predicts Efficiency of Micropollutant Removal
Release Date: 02/19/2025
subscribe/listen via iTunes, download(894KB), Transcript(894KB)
By Isaac Conrad
Highlights:
- New machine learning models can predict the effectiveness of treating micropollutant-contaminated water with granular activated carbon.
- Proved more accurate than previous models, the approach identified key factors that affect cleanup success.
- Freely available online, a model may be used to design site-specific pilot treatment systems and estimate costs for remediating various contaminants.
Research Summary
Scientists at the NIEHS-funded North Carolina State University Superfund Research Program (SRP) Center created machine learning models that can help predict how well granular activated carbon (GAC) can clean up contaminated water. With his student Yoko Koyama, Detlef Knappe, Ph.D., developed models that consider properties of the micropollutants — such as PFAS and volatile organic compounds — specific characteristics of the water being treated, and features of different GAC types.
“GAC is the best available technology for treating water contaminated with organic micropollutants, but scaling up from the lab to full-scale treatment systems is costly and time-consuming,” said Knappe. “Machine learning models can help overcome these challenges by predicting how successful a proposed GAC treatment system will be based on the contaminant present and characteristics of the water. So, we set out to do just that.”
Building the Models
First, to train and test the models, researchers created a database of 413 breakthrough values — representing the volume of water that can be treated before 10% of micropollutants pass through granular activated carbon untreated. The database included 43 different micropollutants, 16 types of GAC, and 38 water samples from diverse environments. They then selected 17 key factors, such as micropollutant properties, water quality, GAC characteristics, and different types of treatment scale and design.
The team built and compared three different types of models. The first used multiple linear regression to analyze how breakthrough values relate to the selected 17 factors. The other two, a random forest model and gradient boosting machine model, use decision tree algorithms to model different treatment scenarios and their outcomes.
Next, they used 371 data points to train the models and the remaining 42 data points to test their performance.

Making Predictions
To evaluate the performance of the three models, the scientists compared their accuracy in predicting GAC system performance.
Compared to earlier multiple linear regression models, the multiple linear regression model in this study was more widely applicable to different treatment scenarios, but its accuracy suffered for systems with low breakthrough values. The decision tree models outperformed it across the full range of breakthrough values and water types.
The team determined that all three models performed well and could be used on new datasets, with the gradient boosting machine model having the lowest error rate, followed by the random forest and multiple linear regression models.
Influential Inputs
Due to its low error and ability to accurately predict performance under the widest range of scenarios, the team used the gradient boosting machine model to identify key factors that influence GAC performance.
They found that two micropollutant properties and one water quality characteristic were key to predicting breakthrough values.
The most influential variable was a micropollutant property related to its ability to interact with its environment through short-range attractive forces with other molecules. The second most predictive variable was the amount of dissolved organic carbon in the contaminated water. The third most important property related to the hydrogen bond acidity of the micropollutant, which determines how likely it is to donate a hydrogen atom to bond with another molecule.
Taken together, this information can help researchers understand how well GAC treatment strategies will work in different scenarios and how much they might cost.
Impact Statement
“Our model is a valuable tool for designing optimized pilot treatment systems and estimating costs associated with GAC to treat emerging micropollutants,” said Knappe. “We’ve also made the gradient boosting machine model freely available for researchers, policy makers, and water treatment professionals to use online.”
For More Information Contact:
Detlef Knappe
North Carolina State University
Mann Hall 319E, Box 7908
Raleigh, North Carolina 27695
Phone: 919-515-8791
Email: knappe@ncsu.edu
To learn more about this research, please refer to the following sources:
- Koyama Y, Fasaee MA, Berglund EZ, Knappe D. 2024. Machine Learning Models to Predict Early Breakthrough of Recalcitrant Organic Micropollutants in Granular Activated Carbon Adsorbers. Environ Sci Technol 58(38):17114-17124. PMID:39271478
To receive monthly mailings of the Research Briefs, send your email address to srpinfo@niehs.nih.gov.