This tutorial gives a quick introduction to Variational Bayes (VB), also called Variational Inference or Variational Approximation.
Denote:
data
likelihood function based on a postulated model
vector of model parameters to be estimated.
prior.
Notation means is defined by .
For any random variable or random vector and any function , we denote by (or , or simply the expectation of where follows a probability distribution with density function .
Bayesian inference encodes all the available information about the model parameter in its posterior distribution with density where , called the marginal likelihood or evidence.
Here, the notation means proportional up to the normalizing constant that is independent of the parameter (). In most Bayesian derivations, such a constant can be safely ignored. Bayesian inference typically requires computing expectations with respect to the posterior distribution. For example, the posterior mean, which is often used for point estimation, is an expectation of with respect to the posterior distribution .
However, it is often difficult to compute such expectations, partly because the density itself is intractable as the normalizing constant is often unknown. For many applications, Bayesian inference is performed using MCMC, which estimates expectations w.r.t. by sampling from it. For other applications where is high dimensional or fast computation is of primary interest, VB is an attractive alternative to MCMC. VB approximates the posterior distribution by a probability distribution with density belonging to some tractable family of distributions such as Gaussians.
The best VB approximation is found by minimizing the Kullback-Leibler (KL) divergence fromto
Then, Bayesian inference is performed with the intractable posterior replaced by the tractable VB approximation . It is easy to see that thus minimizing KL is equivalent to maximizing the lower bound on
Without any constraint on , the solution to is ; of course this solution is useless as it is itself intractable. Depending on the constraint imposed on the class , VB algorithms can be categorized into two classes: