The strategic deployment of biorefineries in the United States demands a nuanced approach, due to the complexity of balancing environmental, social, and economic constraints [1]. Particularly, communities already burdened by elevated air pollution [2], such as high concentrations of particulate matter (PM2.5), ozone (O₃), and nitrogen dioxide (N O₂) - coupled with socioeconomic vulnerabilities, face disproportionate adverse impacts from industrial siting decisions. Addressing these concerns requires strategic and informed decisions to minimize environmental burdens and safeguard vulnerable populations.
This study introduces a comprehensive, machine learning-driven geospatial analytical framework designed to identify optimal locations for biorefinery deployment across U.S. counties. Utilizing robust environmental and socioeconomic datasets, the analysis focused on adherence to U.S. Environmental Protection Agency (EPA) - defined thresholds critical for air quality, specifically PM2.5 (< 9 µg/m³), O₃ (< 70 ppb), and NO₂ (< 7.47 ppb). Socioeconomic vulnerability indicators, including low-income percentages, unemployment rates, and disability prevalence, were also integrated based on EPA-defined percentiles.
A meticulously cleaned and normalized dataset comprising 239,781 data points representing nationwide U.S. counties was used to train and validate a Random Forest classification model within Wolfram Mathematica software. Each location was labeled using binary indicators (1 - suitable, 0 - non-suitable) derived from rigorous threshold-based logic. The resulting models exhibited exceptional predictive performance, achieving test accuracies of 0.9 and strong R-squared scores (>0.91), indicating high reliability, robustness, and generalizability. Model validation was further reinforced through cross-validation, significantly mitigating the risks of overfitting and ensuring stable performance across diverse subsets of the data.
Advanced geospatial visualization techniques using GeoGraphics and GeoPosition were employed to map identified locations, distinguishing between suitable (green hotspots) and non-suitable (red hotspots) counties. These intuitive visual outputs provide actionable decision-support tools, effectively guiding stakeholders and policymakers toward sustainable and equitable biorefinery siting decisions. Future work should explore replacing fixed threshold-based binarization with more flexible, data-driven approaches such as probabilistic or fuzzy classification to better capture the nuanced gradients of environmental and socioeconomic risk across regions.
The developed framework effectively integrates environmental and public health considerations, thereby supporting the advancement of sustainable chemical engineering practices while proactively addressing environmental justice concerns.
References
[1] Prioux Nancy, Ouaret Rachid, Belaud Jean-Pierre. Machine Learning Based Framework for Biorefinery Environmental Assessment. Chemical Engineering Transactions 2022;96:517–22. https://doi.org/10.3303/CET2296087.
[2] Nunez Y, Benavides J, Shearston JA, Krieger EM, Daouda M, Henneman LRF, et al. An environmental justice analysis of air pollution emissions in the United States from 1970 to 2010. Nat Commun 2024;15:268. https://doi.org/10.1038/s41467-023-43492-9.