We have seen that the Beta distribution (*s*+1, *n-s*+1,1) (note for Crystal Ball 7.0+ users) provides an estimate of the binomial probability *p* when we have observed *s* successes in *n* independent trials, and assuming a prior Uniform(0,1) distribution. The posterior density has the function:

f(\theta )\propto \theta ^{s}(1-\theta)^{(n-s)} |

Taking logs gives:

L(\theta )=K+s\quad log_{e}[\theta ]+(n-s)log_{e}[1-\theta ] |

and \frac{dL(\theta)}{d\theta }=\frac{s}{\theta}-\frac{n-s}{1-\theta} \frac{d^{2}L(\theta)}{d\theta ^{2}}=-\frac{s}{\theta ^{2}}-\frac{n-s}{(1-\theta)^{2}}

We first find our best estimate *q**0* of *q*:

\frac{dL(\theta)}{d(\theta)}\Big|_{\theta _{0}}=\frac{s}{\theta_{0}}-\frac{n-s}{1-\theta _{0}}=0 |

which gives the intuitively encouraging answer:

\theta _{0}=s/n |

i.e. our best guess for the binomial probability is the proportion of trials that were successes. Next we find the standard deviation *s* for the Normal approximation to this Beta distribution:

\frac{d^{2}L(\theta)}{d\theta ^{2}}\Big|_{\theta 0}=-\frac{s}{\theta _{0}^{2}}-\frac{n-s}{(1-\theta _{0})^{2}}=-\frac{n}{\theta _{0}(1-\theta _{0})} |

which gives:

\sigma =\Big[-\frac{d^{2}L(\theta )}{d\theta ^{2}}\Big|_{\theta 0}\Big]^{-\frac{1}{2}}=\Big[\frac{\theta_{0}(1-\theta _{0})}{n}\Big]^{\frac{1}{2}} |

(1)

and so we get the approximation:

\theta \approx \quad Normal\Big(\theta _{0},\Big[\frac{\theta _{0}(1-\theta _{0})}{n}\Big]^{\frac{1}{2}}\Big)= Normal\Big(\frac{s}{n},\Big[\frac{s(n-s)}{n^{3}}\Big]^{\frac{1}{2}}\Big) |

(2)

Equation 1 for *s* allows us some useful insight into the behavior of the Beta distribution. We can see in the numerator that the spread of the Beta distribution, and therefore our measure of uncertainty about *q*'s true value, is a function of our best estimate for *q*. The function [*q**0* (1-*q**0*)] is at its maximum when *q**0* = ½, so for a given set of trials *n*, we will be more uncertain about the true value of *q* if it is close to ½ than if it was closer to 0 or 1. Looking at the denominator we see that the degree of uncertainty, represented by s, is proportional to *n*-½. We will see time and again that the level of uncertainty of some parameter is inversely proportional to the square root of the amount of data available. Note also that Equation 2 is exactly the same as the classical statistics result of making the Normal approximation to the Binomial.

But when is this quadratic approximation to *L(q**)*, i.e. the Normal approximation to *f(q**)*, a reasonably good fit? The mean *m* and variance *V* of a Beta(*s*+1,*n*-*s*+1,1) distribution are as follows:

\mu =\frac{s+1}{n+2} V=\frac{(s+1)(n-s+1)}{(n+2)^{2}(n+3)}

Comparing these identities with Equation 2, we can see that the Normal approximation works when *s* and (*n-s*) are both sufficiently large that adding 1 to *s* and adding 3 to *n* proportionally have little effect, i.e. when:

\frac{s+1}{s} \approx 1 and \frac{n+3}{n} \approx 1