An increasing number of chemical and process engineering applications rely on machine learning models as surrogates or for data-driven insights [1-2]. Modern machine learning models are vulnerable to poisoning attacks, where adversarial manipulation of even a small fraction of training data can lead to catastrophic failures [3]. Data poisoning attacks occur when an actor can strategically modify training data to compromise model performance or introduce specific vulnerabilities. These attacks can manifest in multiple forms: untargeted attacks (potentially measurement noise) aim to degrade overall model performance causing denial-of-service [4-5]; targeted attacks compromise performance on specific input types [6-7]; and backdoor attacks maintain normal model behavior while introducing trigger patterns that cause errors during deployment [8-9]. Despite their significance, existing defenses are typically attack-specific and provide no guarantees against new attack methods, creating an ineffective "arms race" between attackers and defenders.
Our recent work introduces Abstract Gradient Training (AGT), a framework that computes sound certificates of robustness against general poisoning adversaries for models trained with first-order optimization methods [10-11]. AGT treats poisoning attacks as constraints over an adversary's perturbation budget and leverages convex relaxations to bound the impact of various attack types. Unlike prior approaches, AGT applies to unmodified models and training algorithms, making it applicable to existing training pipelines. However, achieving both high performance and robustness remains challenging due to the complex relationship between hyperparameter choices and a model's vulnerability to adversarial attacks.
This work advances the design of robust machine learning pipelines by introducing a novel multi-objective Bayesian Optimization methodology that leverages AGT to tune hyperparameters for both performance and certified robustness. Bayesian Optimization has previously been applied to the problem of hyperparameter tuning in machine learning settings [12], though typically purely to optimize the performance of the model. In contrast, our approach formulates the hyperparameter tuning problem with dual objectives: minimizing validation error and minimizing the certified error bound computed using AGT. We demonstrate how our framework identifies Pareto-optimal solutions that balance model performance and robustness. Experimental results show that configurations optimized solely for performance exhibit greater vulnerability to actual data poisoning attacks, while our robust solutions effectively resist adversarial perturbations.
References:
[1] Bhosekar, A., & Ierapetritou, M. (2018). Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Computers & Chemical Engineering, 108, 250-267.
[2] Thebelt, A., Wiebe, J., Kronqvist, J., Tsay, C., & Misener, R. (2022). Maximizing information from chemical engineering data sets: Applications to machine learning. Chemical Engineering Science, 252, 117469.
[3] Carlini, N., Jagielski, M., Choquette-Choo, C. A., Paleka, D., Pearce, W., Anderson, H., ... & Tramèr, F. (2024, May). Poisoning web-scale training datasets is practical. In 2024 IEEE Symposium on Security and Privacy (SP) (pp. 407-425). IEEE.
[4] Muñoz-González, L., Biggio, B., Demontis, A., Paudice, A., Wongrassamee, V., Lupu, E. C., & Roli, F. (2017, November). Towards poisoning of deep learning algorithms with back-gradient optimization. In Proceedings of the 10th ACM workshop on artificial intelligence and security (pp. 27-38).
[5] Newell, A., Potharaju, R., Xiang, L., & Nita-Rotaru, C. (2014, November). On the practicality of integrity attacks on document-level sentiment analysis. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop (pp. 83-93).
[6] Shafahi, A., Huang, W. R., Najibi, M., Suciu, O., Studer, C., Dumitras, T., & Goldstein, T. (2018). Poison frogs! targeted clean-label poisoning attacks on neural networks. Advances in neural information processing systems, 31.
[7] Zhu, C., Huang, W. R., Li, H., Taylor, G., Studer, C., & Goldstein, T. (2019, May). Transferable clean-label poisoning attacks on deep neural nets. In International conference on machine learning (pp. 7614-7623). PMLR.
[8] Chen, X., Liu, C., Li, B., Lu, K., & Song, D. (2017). Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526.
[9] Saha, A., Subramanya, A., & Pirsiavash, H. (2020, April). Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11957-11965).
[10] Sosnin, P., Müller, M. N., Baader, M., Tsay, C., & Wicker, M. (2024). Certified robustness to data poisoning in gradient-based training. arXiv preprint arXiv:2406.05670.
[11] Wicker, M., Sosnin, P., Shilov, I., Janik, A., Müller, M. N., de Montjoye, Y. A., ... & Tsay, C. (2024). Certification for Differentially Private Prediction in Gradient-Based Training. arXiv preprint arXiv:2406.13433.
[12] Chen, Y., Huang, A., Wang, Z., Antonoglou, I., Schrittwieser, J., Silver, D., & de Freitas, N. (2018). Bayesian optimization in alphago. arXiv preprint arXiv:1812.06855