To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree

Many methods exist to calculate the confidence intervals of a Binomial probability. Here we discuss the mid-p approach (Agresti and Coull, 1988) in more details as it is a handy method to calculate the confidence of a binomial probability when we don't know if the Binomial outcome is present, as further discussed here

According to the mid-P approach, the confidence we have that the true value of the probability is less than or equal to p is:


P(X>s;n,p) + ½ P(X=s;n,p)


where X is the random variable of the number of successes one could have from n trials, and s is the number of observed successes. This is saying that the greater the true value of the probability p, the more confident we would be in observing a particular number of successes s, or more.  Translated into Excel formula, the confidence we have that the true probability is less than any specific tested value p is given by:


=1-BINOM.DIST(s,n,p,1)+0.5*BINOM.DIST(s,n,p,0)


We can use this to construct a cumulative confidence distribution:



Figure 1: Cumulative distributions of estimate of p for n = 10 trials and varying number of successes s


We can then use a Cumulative Distribution to sample from the distribution plotted above. 



Note:

This technique requires that the BINOM.DIST function in Excel behaves very precisely because the equation has to be a non-decreasing function of p, no matter how small the increments between tested values of p. 


Looking at Figure 1 you'll see that the cumulative distribution for s = 0 starts at 0.5. That means the distribution is assigning 50% confidence to p = 0, and the remaining 50% confidence to all other values of p. That is equivalent to saying that when there have been no successes, we are equally confident that no such stochastic process exists. The Bayesian equivalent would be to assign a prior distribution with 1/3 confidence assigned to p=0 and p=1 each, and 1/3 confidence distributed over (0,1). One could argue therefore that this method would not be appropriate if you knew from other evidence or logical reasoning that there is a non-zero risk, i.e. that p >0.

 

Below we implement these calculations in different MC simulation software:

@RISK we can sample directly from the cumulative construction using the RiskCumul function

  Binomial_confidence_construction




  • No labels