2024 AIChE Annual Meeting
(196b) A Novel Object-Oriented Integrated Environment for Deep Learning Networks Utilizing Staggered Training Procedures and New Theoretical Concepts
Authors
Balancing computational efficiency with model performance is a fine trade-off, highlighting the need for innovative approaches to streamline the training process while maintaining high-quality results. In line with this objective, this contribution aims to explore both old and new training methodologies, alongside their theoretical foundation.
As a novel approach, the block coordinate descent (BCD) method (Wrigth, 2015) is employed for the training/fitting of DLNs in order to reduce the computational effort required by focusing optimization efforts on individual layers sequentially, adjusting the parameters of one layer at a time. Various strategies are explored for the selection of the next layer after adjusting the parameters (weights and biases) of the previous iteration, ranging from random selection to utilizing layer sensitivity measures as proposed by Zhang et al. (2023).
Additionally, the proposed training strategy integrates this sequential layer adjustment within a traditional batch-based training framework for DLNs (You et al., 2019). Here, batches are formed by randomly selecting subsets of sample points from the dataset.
In both the BCD and the batch-based training approaches, the choice of the underlying optimization method for tuning the DLN parameters is crucial. This study utilizes the Iterated Control Random Search (ICRS) method (Li et al., 2016) for determining good starting values for the training process. Beyond conventional gradient-based methods, such as ADAM (Barakat and Bianchi, 2021), the paper also explores the use of broader unconstrained optimization techniques such as the L-BFGS method, albeit with a severely restricted number of iterations (no more than 3) within each major iteration of the training schemes.
Furthermore, stochastic gradient calculation (SGC) is implemented as a standard option for any optimization scheme utilized. This approach is based on the evaluation of the least squares training objective using a single randomly chosen sample point from the dataset. Nonetheless, it has to be noted that the SGC is not recommended for quasi-Newton type methods, such as L-BGFS.
Overall, this study contributes novel theoretical and algorithmic advancements to the field of Machine Learning/Artificial Intelligence. Computational validations demonstrate that the proposed strategies can significantly enhance the efficiency of DLB training, depending on the application context.
References
Barakat, A., Bianchi, P., 2021, Convergence and dynamical behavior of the ADAM algorithm for stochastic optimization, SIAM Journal on Optimization 31 (1), https://doi.org/10.1137/19M1263443.
Dargan, S., Kumar, M., Ayyagari, M.R., Kumar, G., 2020, A survey of deep learning and its applications: A new paradigm to machine learning, Archives of Computational Methods in Engineering 27, 1071-1092, https://doi.org/10.1007/s11831-019-09344-w.
Frontistis, Z., Lykogiannis, G., Sarmpanis, A., 2023, Artificial neural networks in membrane bioreactors: A comprehensive review – Overcoming challenges and future perspectives, Sci 5 (3), 31, https://doi.org/10.3390/sci5030031.
Li, B., Nguyen, V.H., Ng, C.L., Del Rio-Chanona, E.A., Vassiliadis, V.S., Arellano-Garcia, H., 2016, ICRS-Filter: A randomized direct search algorithm for constrained nonconvex optimization problems, Chemical Engineering Research and Design 106, 178-190, https://doi.org/10.1016/j.cherd.2015.12.001.
Lin, W.H., Wang, P., Chao, K.M., Lin, H.C., Yang, Z.Y., Lai, Y.H., 2022, Deep-learning model selection and parameter estimation from a wind power farm in Taiwan, Applied Sciences 12 (14), 7067, https://doi.org/10.3390/app12147067.
Roodschild, M., Sardinas, J.G., Will, A., 2020, A new approach for the vanishing gradient problem on sigmoid activation, Progress in Artificial Intelligence 9, 351-360, https://doi.org/10.1007/s13748-020-00218-y.
Wright, S.J., 2015, Coordinate descent algorithms, Mathematical Programming 151, 3-34, https://doi.org/10.1007/s10107-015-0892-3.
You, Y., Hseu, J., Ying, C., Demmel, J., Keutzer, K., Hsieh, C.J., 2019, Large-batch training for LSTM and beyond, SC’19: Proceeding of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1-16, https://doi.org/10.1145/3295500.3356137.
Zhang, S., Vassiladis, V.S., Dorneanu, B., Arellano-Garcia, H., 2023, Hierarchical multi-scale optimization of deep neural networks, Applied Intelligence 53 (21), 24963-24990, https://doi.org/10.1007/s10489-023-04745-8.