2025 AIChE Annual Meeting
(259c) Automated Discovery of Diverse Disturbance Scenarios Via Combinatorial Multi-Arm Bandits and Time-Series Diffusion Models: Application to Building Control Systems
Authors
To address this gap, we propose a new framework that formulates the discovery of diverse disturbance scenarios as a combinatorial multi-armed bandit (CMAB) problem [3] (an extension of the classical multi-arm bandit problem [4]). In this formulation, each “super arm” (i.e., subset of base arms) corresponds to a set of disturbance scenarios selected from historical or synthetic data. A diversity-based reward is defined using pairwise Dynamic Time Warping (DTW) distances [5] between simulation trajectories, and the Combinatorial Upper Confidence Bound (CUCB) algorithm is used to maximize this reward over repeated rounds of simulation. This enables the automatic selection of scenario sets that induce maximally diverse building control responses.
To enhance diversity in scenarios beyond what is available in measured datasets, we integrate a time-series diffusion model, Diffusion-TS [6], trained on multi-year disturbance data from a multi-zone office building in Japan. The generative model enables sampling of realistic yet novel disturbance trajectories, thereby expanding the search space for the CUCB algorithm. Our computational experiments show that the proposed framework identifies high-diversity disturbance sets more effectively than greedy baselines, and that the inclusion of synthetic data further improves performance. This hybrid CMAB-diffusion approach provides a scalable and flexible way to generate robust training scenarios for advanced building control strategies.
References:
[1] Pippia, T., Lago, J., De Coninck, R., & De Schutter, B. (2021). Scenario-based nonlinear model predictive control for building heating systems. Energy and Buildings, 247, 111108.
[2] Gao, Y., Miyata, S., & Akashi, Y. (2023). Energy saving and indoor temperature control for an office building using tube-based robust model predictive control. Applied Energy, 341, 121106.
[3] Chen, W., Wang, Y., & Yuan, Y. (2013, February). Combinatorial multi-armed bandit: General framework and applications. In International conference on machine learning (pp. 151-159). PMLR.
[4] Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.
[5] Müller, M. (2007). Dynamic time warping. Information retrieval for music and motion, 69-84.
[6] Yuan, X., & Qiao, Y. (2024). Diffusion-TS: Interpretable diffusion for general time series generation. arXiv preprint arXiv:2403.01742.