When we have a reasonable amount of data with which to calculate the likelihood function, the posterior distribution tends to come out looking approximately Normally distributed. In this section we will examine why that is, and provide a shorthand method to determine the approximating Normal distribution directly without needing to go through a complete Bayesian analysis.

Our best estimate *q0* of the value of a parameter *q* is the value for which the posterior distribution *f*(*q* ) is at its maximum. Mathematically, this equates to the condition:

\frac{df(\theta)}{d\theta}\Big|_{\theta _{0}}=0 |

(1)

That is to say, *q**0* occurs where the gradient of *f*(*q*) is zero. Strictly speaking, we also require that the gradient of *f*(*q*) is going from positive to negative for *q* _{0} to be a maximum, i.e.:

\frac{d^{2}f(\theta)}{d\theta ^{2}}\Big|_{\theta _{0}}< 0 |

The second condition is only of any importance if the posterior distribution has two or more peaks, for which a Normal approximation to the posterior distribution would be inappropriate anyway. Taking the first and second derivatives of *f(**q)* assumes that *q* is a continuous variable, but the principle applies equally to discrete variables, in which case we are just looking for that value of *q* for which the posterior distribution has the highest value.

The Taylor series expansion of a function allows one to produce a polynomial approximation to some function *f(x)* about some value *x _{0}* that usually has a much simpler form than the original function. The Taylor series expansion says:

f(x)=\displaystyle\sum_{m-0}^{\infty}\frac{f^{(m)}(x_{0})}{m!}(x-x_{0})^{m} |

where *f(m)(x)* represents the *m*th derivative of *f(x)* with respect to *x*.

To make the next calculation a little easier to manage, we first define the log of the posterior distribution *L(**q)* = log_{e}[*f(**q)*]. Since *L(**q)* increases with *f(**q)* the maximum of *L(**q)* occurs at the same value of *q* as the maximum of *f(**q)*. We now apply the Taylor series expansion of *L*(*q*) about *q*_{0} for the first three terms:

L(\theta)=L(\theta _{0})+\frac{dL(\theta)}{d\theta}\Big|_{\theta _{0}}(\theta -\theta_{0})+\frac{1}{2}\frac{d^{2}L(\theta)}{d\theta ^{2}}\Big|_{\theta _{0}}(\theta -\theta_{0})^{2}+\dots |

The first term in this expansion is just a constant value (*k*), and tells us nothing about the shape of *L(**q)*; the second term equals zero from Equation 1, so we are left with the simplified form:

L(\theta)=k+\frac{1}{2}\frac{d^{2}L(\theta)}{d\theta ^{2}}\Big|_{\theta _{0}}(\theta -\theta_{0})^{2}+\dots |

This approximation will be good providing the higher order terms (*m* = 3, 4, etc.) have much smaller values than the *m* = 2 term here.

We can now take the exponential of *L(q)* to get back to *f(q)*:

f(\theta )\approx K\quad exp\Big(\frac{1}{2}\frac{d^{2}L(\theta )}{d \theta ^{2}}\Big|_{\theta _{0}}(\theta -\theta _{0})^{2}\Big) |

where *K* is a normalizing constant. Now, the Normal(*m*,*s*) distribution has probability density function *f(x)* given by:

f(x)=\frac{1}{\sqrt{2\pi \sigma ^{2}}}exp \Big(-\frac{(x-\mu )^{2}}{2\sigma^{2}}\Big) |

Comparing the above two equations, we can see that *f(**q)* has the same functional form as a Normal distribution where:

\mu =\theta _{0} and \sigma =\Big[-\frac{d^{2}L(\theta)}{d\theta^{2}}\Big|_{\theta _{0}}\Big]^{-\frac{1}{2}}

and we can thus often approximate the Bayesian posterior distribution by the following Normal distribution:

\theta = Normal \Bigg(\theta _{0},\Big[-\frac{d^{2}L(\theta)}{d\theta ^{2}}\Big|_{\theta _{0}}\Big]^{-\frac{1}{2}}\Bigg) |

We illustrate this Normal (or quadratic) approximation with a two simple examples:

Normal approximation to the Beta posterior distribution

Bayesian estimate of the mean of a Normal distribution with unknown standard deviation