2022 Annual Meeting
(416c) Projecting the Effectiveness of Deep Ensembles
Authors
Here we introduce a method of Bayesian inference to characterize the distribution of potential model predictions from which we sample during ensembling. This approach is completely posterior to the practice of training the model, requiring no alteration of the original model architecture. Through characterization of this distribution, we are able to separate he portion of model prediction errors that are due to model variance from those that would still be present with an infinite number of ensemble models. We are able to use this separation to evaluate which model regimes are subject to errors correctable by ensembling and those that are not. Further, we are able to use this distribution to estimate with good accuracy the expected performance of a model with a larger number of ensemble submodels. We demonstrate the robustness of these projections in common benchmark datasets as well as an artificially constructed dataset where the level of data noise errors can be controlled accordingly. We also evaluate the cases in which the variance of ensemble predictions can be useful as a metric of uncertainty and show the limitations of this metric to predict nonvariance errors when calibrated or uncalibrated.