In risk analysis, we are frequently faced with having to estimate a probability, a fraction or a prevalence. We usually have some data that would help us produce this estimate, that come from surveys, experiments, or even computer simulations. If we can be sure that the data are collected according to a __binomial process__, we can use the Beta distribution to describe our uncertainty about the prevalence, fraction or probability by applying the formula:

p = Beta(s+1, n-s+1, 1)

where n is the number of trials or samples, and s is the number of "successes'.

The above Beta distribution has a domain of [0,1] so is an immediate contender to model uncertainty or randomness about a probability, fraction or prevalence. However, there are more technical reasons for using the Beta distribution here; namely that it is the conjugate to the Binomial distribution and the above formula is the result of a Bayesian inference calculation with an uninformed prior. Translation for the layperson: the Beta distribution is the direct result of a statistical analysis where we assume that the data come from a binomial process, and where we knew nothing about the parameter p being estimated, prior to collecting these data.

**Example 1: Population prevalence of (or fraction of animals with) a disease**

Imagine that we have some data on the prevalence of BSE among calves in some country. 300 calves were randomly selected, and tested by examining brain tissue. 6 were found to be infected. Assuming for the moment that the test is 100% accurate (i.e. all those with BSE found in the brain tissue had BSE, and none of the others did), we could get a single-point (best guess) estimate of the prevalence of BSE among the calf population in general as 6/300 or 2%. However, the relatively small number of samples taken means that there remains some reasonable uncertainty about what that true prevalence actually is.

Taking a small (relative to the population size) random sample from a population and then determining whether each sample either does or does not have some particular characteristic is an example of a binomial process. Thus, the prevalence is equivalent to the binomial probability p and we can therefore use the __Beta__ distribution to describe the remaining uncertainty about p. If we had tested 1000 animals and had found the same proportion (2%) infected, our estimate of the population prevalence would have been more precise (narrower distribution of uncertainty).

**Example 2: Comparison of probability of success of two or more treatments**

Imagine that we are a government agency charged with regulating the prescription of drugs. A pharmaceutical company has developed a new drug useful in the control of disease X. In field trials they have tried various dose:duration combinations with randomly selected individuals suffering from X. The results are as follows:

Treatment | Patients tested | Patients cured | Estimated P(cured) |
---|---|---|---|

A | 172 | 121 | 121/172 = 0.703 |

B | 196 | 77 | 77/196 = 0.393 |

C | 92 | 55 | 55/92 = 0.598 |

D | 57 | 42 | 42/57 = 0.737 |

At first glance it looks like treatment combination D is superior, and all else equal, the agency could decide to authorize this combination in preference to the others. But how confident are we that combination D is the most effective of the four options? Is it plausible that combination A is actually better? Model Treatment Comparison gives the answers.

The links to the Treatment Comparison software specific models are provided here:

**Uncertainty about p when the data are ambiguous**

In certain circumstances there will be some ambiguity about the true state of an individual in a sample because the method used to determine whether an individual in the sample is of a particular characteristic or not is imperfect. For example, imagine that we are doing a random survey of people in some region to determine the prevalence of infection with a particular micro-organism. The test consists of randomly selecting a person, drawing a sample of their blood and looking for antibodies. The test is imperfect, however, as there remains a possibility that antibodies may not be detected in the blood of an infected person, and that antibodies may be misidentified in samples from non-infected people. The following figure illustrates the four possible scenarios: p is the true population prevalence, Se is the probability that an infected person will test positive and Sp is the probability that a non-infected person will test negative.

Thus, the probability that a random person will test positive is given by:

P(+)=pSe+(1-p)(1-Sp) |

The models below make use of this P(+) formula in a Bayesian inference calculation to correct for the imperfection of the test and express the uncertainty about the true prevalence.