2024 AIChE Annual Meeting

(196b) A Novel Object-Oriented Integrated Environment for Deep Learning Networks Utilizing Staggered Training Procedures and New Theoretical Concepts

Checkout You must be logged in to view this content. Log in now.

Pricing

Individuals

List Price	225.00
AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free

Authors

Bogdan Dorneanu - Presenter

Mina Keykha, Brandenburg University of Technology

Vassilios S. Vassiliadis, University of Cambridge

Harvey Arellano-Garcia, Brandenburg University of Technology Cottbus

Efficient training of deep learning networks (DLNs) poses several challenges, primarily related to the complexity and the scale of the networks involved. DLNs typically consist of numerous layers with a large number of parameters, making optimization a computationally intensive task (Dargan et al., 2020). Additionally, the explosion of available data exacerbates this challenge, as processing large datasets requires significant computational resources. Moreover, DLNs are susceptible to issues like vanishing or exploding gradients, which can hinder convergence during training (Roodschild et al., 2020). Furthermore, selecting appropriate optimization algorithms and strategies becomes crucial for effective training (Lin et al., 2022; Frontistis et al., 2023).

Balancing computational efficiency with model performance is a fine trade-off, highlighting the need for innovative approaches to streamline the training process while maintaining high-quality results. In line with this objective, this contribution aims to explore both old and new training methodologies, alongside their theoretical foundation.

As a novel approach, the block coordinate descent (BCD) method (Wrigth, 2015) is employed for the training/fitting of DLNs in order to reduce the computational effort required by focusing optimization efforts on individual layers sequentially, adjusting the parameters of one layer at a time. Various strategies are explored for the selection of the next layer after adjusting the parameters (weights and biases) of the previous iteration, ranging from random selection to utilizing layer sensitivity measures as proposed by Zhang et al. (2023).

Additionally, the proposed training strategy integrates this sequential layer adjustment within a traditional batch-based training framework for DLNs (You et al., 2019). Here, batches are formed by randomly selecting subsets of sample points from the dataset.

In both the BCD and the batch-based training approaches, the choice of the underlying optimization method for tuning the DLN parameters is crucial. This study utilizes the Iterated Control Random Search (ICRS) method (Li et al., 2016) for determining good starting values for the training process. Beyond conventional gradient-based methods, such as ADAM (Barakat and Bianchi, 2021), the paper also explores the use of broader unconstrained optimization techniques such as the L-BFGS method, albeit with a severely restricted number of iterations (no more than 3) within each major iteration of the training schemes.

Furthermore, stochastic gradient calculation (SGC) is implemented as a standard option for any optimization scheme utilized. This approach is based on the evaluation of the least squares training objective using a single randomly chosen sample point from the dataset. Nonetheless, it has to be noted that the SGC is not recommended for quasi-Newton type methods, such as L-BGFS.

Overall, this study contributes novel theoretical and algorithmic advancements to the field of Machine Learning/Artificial Intelligence. Computational validations demonstrate that the proposed strategies can significantly enhance the efficiency of DLB training, depending on the application context.

References

Barakat, A., Bianchi, P., 2021, Convergence and dynamical behavior of the ADAM algorithm for stochastic optimization, SIAM Journal on Optimization 31 (1), https://doi.org/10.1137/19M1263443.

Dargan, S., Kumar, M., Ayyagari, M.R., Kumar, G., 2020, A survey of deep learning and its applications: A new paradigm to machine learning, Archives of Computational Methods in Engineering 27, 1071-1092, https://doi.org/10.1007/s11831-019-09344-w.

Frontistis, Z., Lykogiannis, G., Sarmpanis, A., 2023, Artificial neural networks in membrane bioreactors: A comprehensive review – Overcoming challenges and future perspectives, Sci 5 (3), 31, https://doi.org/10.3390/sci5030031.

Li, B., Nguyen, V.H., Ng, C.L., Del Rio-Chanona, E.A., Vassiliadis, V.S., Arellano-Garcia, H., 2016, ICRS-Filter: A randomized direct search algorithm for constrained nonconvex optimization problems, Chemical Engineering Research and Design 106, 178-190, https://doi.org/10.1016/j.cherd.2015.12.001.

Lin, W.H., Wang, P., Chao, K.M., Lin, H.C., Yang, Z.Y., Lai, Y.H., 2022, Deep-learning model selection and parameter estimation from a wind power farm in Taiwan, Applied Sciences 12 (14), 7067, https://doi.org/10.3390/app12147067.

Roodschild, M., Sardinas, J.G., Will, A., 2020, A new approach for the vanishing gradient problem on sigmoid activation, Progress in Artificial Intelligence 9, 351-360, https://doi.org/10.1007/s13748-020-00218-y.

Wright, S.J., 2015, Coordinate descent algorithms, Mathematical Programming 151, 3-34, https://doi.org/10.1007/s10107-015-0892-3.

You, Y., Hseu, J., Ying, C., Demmel, J., Keutzer, K., Hsieh, C.J., 2019, Large-batch training for LSTM and beyond, SC’19: Proceeding of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1-16, https://doi.org/10.1145/3295500.3356137.

Zhang, S., Vassiladis, V.S., Dorneanu, B., Arellano-Garcia, H., 2023, Hierarchical multi-scale optimization of deep neural networks, Applied Intelligence 53 (21), 24963-24990, https://doi.org/10.1007/s10489-023-04745-8.

Breadcrumb

2024 AIChE Annual Meeting

(196b) A Novel Object-Oriented Integrated Environment for Deep Learning Networks Utilizing Staggered Training Procedures and New Theoretical Concepts

Authors