2025 AIChE Annual Meeting

(510e) Predictive Modeling of Hydrogen YIELD from Biomass-Derived BIO-Oils Using Machine Learning

Authors

Diwakar Z. Shende, Visvesvaraya National Institute of Technology Nagpur formally known as Visvesvaraya Regional College of
Kishor M. Bhurchandi, Visvesvaraya National Institute of Technology, Nagpur, India
Abstract
With growing interest in sustainable hydrogen production, accurate prediction of hydrogen yield
from biomass-derived bio-oils is essential for process optimization. This study develops machine
learning models to estimate hydrogen yield from a bio-oil’s composition, operating conditions,
and catalyst types, including features from cataloged experiments such as C and H elemental
composition, alkane content, temperature, pressure, catalyst type, liquid feed concentration,
contact time, and conversion rate. The dataset comprises over 100 entries. Feature preprocessing
steps included normalization and encoding of categorical features, as well as multicollinearity
checks. Several regression algorithms were tested, including Random Forest, Support Vector
Regression, and Artificial Neural Networks. For model evaluation, R², MAE, and RMSE were
calculated. The preliminary outcomes showed remarkable accuracy, with Random Forest yielding
the best results in dealing with non-linear dependencies. This work proves the efficacy of data
based methods for decision processes in biohydrogen production systems, especially for
optimizing catalyst/feedstock configurations for improved yield.

1. INTRODUCTION
Hydrogen derived from biomass offers a renewable alternative to fossil fuel-based production,
with potential environmental benefits (Balat, 2010). Various thermochemical and biological
pathways exist for biomass-to-hydrogen conversion, including pyrolysis, gasification, and
fermentation (Buffi et al., 2022). Thermochemical routes, such as steam reforming of bio-oils, are
more mature and efficient, while biological methods are still developing but offer decentralized
production options (Copa Rey et al., 2024). A two-stage process involving fast pyrolysis followed
by catalytic steam reforming of bio-oil fractions has been proposed as an economically viable
approach (Czernik et al., 2008). Some biomass-to-hydrogen technologies, like anaerobic digestion
and conventional gasification, show promise in terms of cost-effectiveness, with production costs
potentially comparable to natural gas steam reforming (Copa Rey et al., 2024). Overall, hydrogen
from biomass presents a sustainable option to complement electrolysis-based production and
contribute to decarbonization efforts (Buffi et al., 2022).
Machine learning (ML) has emerged as a powerful tool for optimizing biohydrogen production
from biomass and agricultural waste. ML algorithms can effectively model complex relationships
between operational parameters and process performance, predict outcomes, and analyze
microbial population dynamics (Pandey et al., 2022; Sharma et al., 2022). These techniques have
been applied to both biochemical and thermochemical conversion processes, demonstrating the
ability to handle large datasets and adapt to changing conditions (Alagumalai et al., 2023). ML
approaches have shown success in categorizing and predicting biohydrogen production data, with
various algorithms exhibiting high accuracy (Alagumalai et al., 2023). Additionally, ML has been
used to predict bio-oil yield and hydrogen content based on biomass composition and pyrolysis
conditions (Tang et al., 2020). Despite its potential, challenges remain in implementing ML for
large-scale biohydrogen production, including techno-economic barriers and the need for further
research to develop reliable process control tools (Pandey et al., 2022; Sharma et al., 2022).
This study aims to apply ML techniques to predict the hydrogen yield (expressed as mol/mol C)
from bio-oil gasification and reforming experiments. The analysis is based on a structured dataset
with parameters related to feed composition, catalyst type, reaction temperature and pressure, and
process metrics such as conversion and selectivity.
2. MATERIALS AND METHODS
2.1 Dataset
The dataset contains 100+ records of experimental data involving hydrogen production from
various bio-oil samples. Each entry includes:
 Feedstock properties: Carbon and hydrogen content, alkane fraction
 Process conditions: Temperature (°C), pressure (atm), contact time (min)
 Catalyst information: Catalyst composition (e.g., 3% Pt/Al₂O₃ variants)
 Performance metrics: Conversion rate (%), CH₄ yield, H₂ selectivity, and hydrogen yield
(mol/mol C)
2.2 Data Preprocessing
 Missing values and anomalies were checked and addressed.
 Catalyst types and liquid feed compositions were encoded using one-hot encoding.
 Numerical features were standardized for uniform model input.
 Correlation analysis was performed to reduce multicollinearity.
2.3 Modeling
Several regression models were developed and compared:
 Random Forest Regressor
 Support Vector Regressor
 Multi-Layer Perceptron (Neural Network)
Model training and validation were conducted using k-fold cross-validation (k=5). Performance
metrics used include R², Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE).
3. RESULTS AND DISCUSSION
Among the models tested, the Random Forest Regressor achieved the highest prediction accuracy,
followed by the MLP. Non-linear models outperformed linear regression approaches due to the
complex relationships between reaction variables and hydrogen yield.
Feature importance analysis indicated that temperature, pressure, and catalyst type significantly
influence hydrogen production. Notably, catalyst variants with different Pt/Al₂O₃ formulations
showed strong effects on both conversion and selectivity, which were effectively captured by the
trained models.
The findings support the use of ML as a complementary approach to experimental research,
providing fast and interpretable insights into biohydrogen systems. The models can also be adapted
for screening new feedstocks and catalyst systems.
4. CONCLUSIONS
Machine learning can effectively model the hydrogen yield from biomass-derived bio-oils by
learning from prior experimental data. These predictive models offer a practical tool for screening
optimal reaction conditions and catalyst formulations without the need for exhaustive experimental
trials. Future work will focus on expanding the dataset and integrating mechanistic constraints to
improve extrapolation capabilities.

References:
1. Alagumalai, A., Devarajan, B., Song, H., Wongwises, S., Ledesma-Amaro, R., Mahian, O.,
Sheremet, M., & Lichtfouse, E. (2023). Machine learning in biohydrogen production: a review.
Biofuel Research Journal, 10(2), 1844–1858. DOI: 10.18331/brj2023.10.2.4
2. Balat, M. (2010). Thermochemical routes for biomass-based hydrogen production. Energy Sources
Part a Recovery Utilization and Environmental Effects, 32(15), 1388–1398. DOI:
10.1080/15567030802706796
3. Buffi, M., Prussi, M., & Scarlat, N. (2022). Energy and environmental assessment of hydrogen
from biomass sources: Challenges and perspectives. Biomass and Bioenergy, 165, 106556. DOI:
10.1016/j.biombioe.2022.106556
4. Czernik, S., French, R., Feik, C., & Chornet, E. (2001). Production of Hydrogen from Biomass
Derived Liquids. Progress in Thermochemical Biomass Conversion, 1577–1585. DOI:
10.1002/9780470694954.ch130
5. Pandey, A. K., Park, J., Ko, J., Joo, H., Raj, T., Singh, L. K., Singh, N., & Kim, S. (2022). Machine
learning in fermentative biohydrogen production: Advantages, challenges, and applications.
Bioresource Technology, 370, 128502. DOI: 10.1016/j.biortech.2022.128502
6. Rey, J. R. C., Mateos-Pedrero, C., Longo, A., Rijo, B., Brito, P., Ferreira, P., & Nobre, C. (2024).
Renewable Hydrogen from Biomass: Technological Pathways and Economic Perspectives.
Energies, 17(14), 3530. https://doi.org/10.3390/en17143530
7. Sharma, A. K., Ghodke, P. K., Goyal, N., Nethaji, S., & Chen, W. (2022). Machine learning
technology in biohydrogen production from agriculture waste: Recent advances and future
perspectives. Bioresource Technology, 364, 128076. DOI: 10.1016/j.biortech.2022.128076
8. Tang, Q., Chen, Y., Yang, H., Liu, M., Xiao, H., Wu, Z., Chen, H., & Naqvi, S. R. (2020).
Prediction of bio-oil yield and hydrogen contents based on machine learning method: effect of
biomass compositions and pyrolysis conditions. Energy & Fuels, 34(9), 11050–11060. DOI:
10.1021/acs.energyfuels.0c01893