A Binomial(p, n) has a mean and standard deviation given by:
\mu=np |
\sigma =\sqrt{np(1-p)} |
From Central Limit Theorem, as n gets large the number of observed successes s will tend to:
s\approx Normal(np,\sqrt{np(1-p)}) |
Equation 2 for the binomial method can then be rewritten and p can be approximated by a Normal when n is large, as follows:
p\approx \frac{Normal(n\frac{s}{n},\sqrt{n\frac{s}{n}(1-\frac{s}{n})}}{n} |
(1)
which can be rearranged to:
p\approx Normal(\frac{s}{n},\sqrt{s\bigg(\frac{1}{n^{2}}-\frac{s}{n^{3}}\bigg)} |
(2)
and which results in the following equation:
p\approx Normal \Big(\frac{s}{n},\sqrt{\frac{s(n-s)}{n^{3}}}\Big) |
(3)
Figure 1: Example of Equation 3 estimate of p where s = 5, n = 10
Figure 2: Example of Equation 3 estimate of p where s = 1, n = 10
Equation 3 works nicely in the plot above for small n (10) because the number of successes was half of n, and so the uncertainty distribution is symmetric about 0.5, which nicely matches the properties of a Normal distribution. However, if one had observed just 1 success from 10 trials, it would look quite different, as shown in Figure 2: now the Normal approximation of Equation 3 is completely inaccurate, assigning considerable confidence to negative values, and fails to reflect the asymmetric nature of the uncertainty distribution.