We compare the classical statistics estimate for p with the Bayesian estimate using three versions of an uninformed prior:
Beta(0,0) which does not exist as the Beta's shape parameters must be positive
Beta(0.5,0.5) which has a peak at zero and one
Beta(1,1) which is a Uniform(0,1) distribution
Note the plots below show the distributions above with an additional third parameter set to 1. This is equivalent to the more standard Beta parameterization with only two shape parameters.
When the Binomial success s is small (or close to n by reflection) the classical and Bayesian with Beta(0,0) prior give the widest distribution. The Bayesian with Beta(1,1) prior gives an estimate closer to 0.5, the Bayesian with Beta(0,0) prior gives an estimate the furthest away from 0.5, and the classical and Bayesian with Beta(0.5,0.5) prior lie in between.
The classical and Bayesian with Beta(0.5,0.5) prior give very similar results for n>9 and 0<s<n. All methods tend to the same result as n gets large, and tend more quickly to the same result as s approaches n/2. The Bayesian method with Beta(0,0) prior only works for 0<s<n.
It is interesting to see from density plots where these four techniques place their emphasis:
When s=1 and n=2 the Bayesian inference Beta(0,0) prior and the classical method have Uniform(0,1) distributions for p: in other words, there appears to be no information contained in the data except to say that 0<p<1. A Bayesian has already stated that the probability exists (and therefore lies within this range) while the classical statistics result must first determine that the probability is not either zero or one.
When s=1 the classical and Bayesian with Beta(0,0) prior results in a mode at p=0 for any n, while the Bayesian with Beta(1,1) prior gives a mode at p=1/n, which is more intuitive.
For a Bayesian conjugate Beta(α,β) the s and n results above give the following means and modes:
Mode* (α-1)/(α+β-2) | Mean α/(α+β) | |
---|---|---|
Classical | » (s-0.6)/(n-1.2) | (s+0.5)/(n+1) |
Beta(0,0) | (s-1)/(n-2) | s/n |
Beta(0.5,0.5) | (s-0.5)/(n-1) | (s+0.5)/(n+1) |
Beta(1,1) | s/n | (s+1)/(n+2) |
* Note the formulas for the modes have some parameter restrictions.
At EpiX Analytics we use Bayesian inference with a Beta(1,1) or Beta(0.5, 0.5) prior when we think we know that the Binomial outcome is present, but we haven't observed it yet. For example, if we are testing for a machine failure yet to be observed, using a Beta(1,1) prior assumes there are failures intrinsic to the machine system so you would eventually observe them.
In contrast, if we don't know whether the Binomial outcome is present, we use the classical construction. For example, if performing a survey to find bugs in computer code it is possible that the code doesn't have bugs. In this case, the classical construction is a more honest option, and will avoid introducing unintended biases from a bayesian prior.
Although the Beta(0,0) prior is used elsewhere, we find it difficult to justify.