Stochastic and robust model predictive control (SMPC and RMPC) [1,2] have shown strong potential in the control of grid-interactive efficient buildings, due to their ability to account for uncertain and dynamic exogenous disturbances such as ambient temperature, internal heat gains from occupancy, and plug loads. Designing SMPC/RMPC systems that remain robust under rare or extreme disturbances requires access to diverse and representative disturbance scenarios. However, current practice often relies on handcrafted scenarios or simple generative models (e.g., Gaussian process and measurement noise), which fail to capture the complexity and diversity of real-world building operations – potentially leading to over-conservative or unrepresentative control strategies.
To address this gap, we propose a new framework that formulates the discovery of diverse disturbance scenarios as a combinatorial multi-armed bandit (CMAB) problem [3] (an extension of the classical multi-arm bandit problem [4]). In this formulation, each “super arm” (i.e., subset of base arms) corresponds to a set of disturbance scenarios selected from historical or synthetic data. A diversity-based reward is defined using pairwise Dynamic Time Warping (DTW) distances [5] between simulation trajectories, and the Combinatorial Upper Confidence Bound (CUCB) algorithm is used to maximize this reward over repeated rounds of simulation. This enables the automatic selection of scenario sets that induce maximally diverse building control responses.
To enhance diversity in scenarios beyond what is available in measured datasets, we integrate a time-series diffusion model, Diffusion-TS [6], trained on multi-year disturbance data from a multi-zone office building in Japan. The generative model enables sampling of realistic yet novel disturbance trajectories, thereby expanding the search space for the CUCB algorithm. Our computational experiments show that the proposed framework identifies high-diversity disturbance sets more effectively than greedy baselines, and that the inclusion of synthetic data further improves performance. This hybrid CMAB-diffusion approach provides a scalable and flexible way to generate robust training scenarios for advanced building control strategies.
References:
[1] Pippia, T., Lago, J., De Coninck, R., & De Schutter, B. (2021). Scenario-based nonlinear model predictive control for building heating systems. Energy and Buildings, 247, 111108.
[2] Gao, Y., Miyata, S., & Akashi, Y. (2023). Energy saving and indoor temperature control for an office building using tube-based robust model predictive control. Applied Energy, 341, 121106.
[3] Chen, W., Wang, Y., & Yuan, Y. (2013, February). Combinatorial multi-armed bandit: General framework and applications. In International conference on machine learning (pp. 151-159). PMLR.
[4] Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.
[5] Müller, M. (2007). Dynamic time warping. Information retrieval for music and motion, 69-84.
[6] Yuan, X., & Qiao, Y. (2024). Diffusion-TS: Interpretable diffusion for general time series generation. arXiv preprint arXiv:2403.01742.