An Optimization-centric View on Bayes' Rule: Reviewing and Generalizing Variational Inference

Abstract

We advocate an optimization-centric view of Bayesian inference. Our inspiration is therepresentation of Bayes’ rule as infinite-dimensional optimization (Csiszár, 1975; Donsker andVaradhan, 1975; Zellner, 1988). Equipped with this perspective, we study Bayesian inferencewhen one does not have access to (1) well-specified priors, (2) well-specified likelihoods, (3)infinite computing power. While these three assumptions underlie the standard Bayesian paradigm, they are typically inappropriate for modern Machine Learning applications. We propose addressing this through an optimization-centric generalization of Bayesian posteriors that we call the Rule of Three (RoT). The RoT can be justified axiomatically and recovers Bayesian, PAC-Bayesian and VI posteriors as special cases. While the RoT is primarily a conceptual and theoretical device, it also encompasses a novel sub-class of tractable posteriors which we call Generalized Variational Inference (GVI) posteriors. Just as the RoT, GVI posteriors are specified by three arguments: a loss, a divergence and a variationalfamily. They also possess a number of desirable properties, including modularity, Frequentistconsistency and an interpretation as approximate ELBO. We explore applications of GVI posteriors, and show that they can be used to improve robustness and posterior marginalson Bayesian Neural Networks and Deep Gaussian Processes.

Publication
Journal of Machine Learning Research
Jeremias Knoblauch
Jeremias Knoblauch
Associate Professor and EPSRC Fellow in Machine Learning & Statistics

My research interests include robust Bayesian methods, generalised and post-Bayesian methodology, variational methods, and simulators.