# (Bayesian inference worked example)

A game warden on a tropical island would like to know how many tigers she has on her island. It is a big island with dense jungle and she has a limited budget, so she can't search every inch of the island methodically. Besides, she wants to disturb the tigers and the other fauna as little as possible. She arranges for a capture-release-recapture survey to be carried out as follows:

Hidden traps are laid at random points on the island. The traps are furnished with transmitters that signal a catch and each captured tiger is retrieved immediately. When 20 tigers have been caught, the traps are removed. Each of these 20 tigers are carefully sedated and marked with an ear tag, then all are released together back to the positions from which they were originally caught. Some short time later, hidden traps are laid again, but at different points on the island until 30 tigers have been caught and the number of tagged tigers is recorded. Captured tigers are held in captivity until the 30th tiger has been caught.

The game warden tries the experiment and 7 of the 30 tigers captured in the second set of traps are tagged. How many tigers are there on the island?

The warden has gone to some lengths to specify the experiment precisely. This is so that we will be able to assume with some reasonable accuracy that the experiment is taking a hypergeometric sample from the tiger population. A hypergeometric sample assumes that an individual with the characteristic of interest (in this case, being tagged) has the same probability of being sampled as any individual that does not have that characteristic (i.e. the untagged tigers). The reader may enjoy thinking through what assumptions are being made in this analysis and where the experimental design has attempted to minimize any deviation from a true hypergeometric sampling.

We will use the usual notation for a hypergeometric process:

n - the sample size, = 30,

D - the number of individuals in the population of interest (tagged tigers) = 20,

M - the population (the number of tigers in the jungle). In the Bayesian inference terminology, this is given the symbol *q* as it is the parameter we are attempting to estimate, and

s - the number of individuals in the sample that have the characteristic of interest = 7.

We could get a best guess for M by noting that the most likely scenario would be for us to see tagged tigers in the sample in the same proportion as they occur in the population. In other words:

\frac{s}{n}\approx\frac{D}{M} \quad i.e.\quad \frac{7}{30}\approx\frac{20}{M} ~ which ~gives~ M \approx 85 ~ to ~ 86 |

but this does not take account of the uncertainty that occurs due to the random sampling involved in the experiment. We will perform a Bayesian inference calculation to determine the uncertainty distribution for M. Let us imagine that before the experiment was started the warden and her staff believed that the number of tigers was equally likely to be any one value as any other. In other words, they knew absolutely nothing about the number of tigers in the jungle and their prior distribution is thus a discrete uniform distribution over all non-negative integers.

The likelihood function is given by the probability mass function of the hypergeometric distribution, i.e.:

l(s \big |\theta) =\frac {\left( \begin{array}{c} D \\ s \end{array} \right)\left( \begin{array}{c} M-D \\ n-s \end{array} \right)}{\left( \begin{array}{c} M \\ n \end{array} \right)} = \frac {\left( \begin{array}{c} 20 \\ 7 \end{array} \right)\left( \begin{array}{c} \theta-20 \\ 23 \end{array} \right)}{\left( \begin{array}{c} \theta \\ 30 \end{array} \right)}, if \theta \underline{>}43 \\ l(s\big |\theta) = 0 ~ otherwise |

The likelihood function is zero for values of *q* below 43 since the experiment tells us that there must be at least 43 tigers: 20 that were tagged plus the (30-7) that were caught in the recapture part of the experiment and were not tagged.

The links to the Tigers software specific models are provided here:

The probability mass function applies to a discrete distribution and equals the probability that exactly s events will occur. Example model Tigers shows a discrete uniform prior, with values of *q* running from 0 to 150, is multiplied by the likelihood function above to arrive at a posterior distribution. We know that the total confidence must add up to one which is done in column F to produce the normalized posterior distribution. The shape of this posterior distribution is shown below by plotting column B against column F from the spreadsheet.

The graph peaks at a value of 85 as we would expect but it appears cut off at the right tail which shows that we should also look at values of *q* larger than 150. The analysis is repeated for values of *q* up to 300 and this more complete posterior distribution plotted below:

This second plot represents a good model of the state of the warden's knowledge about the number of tigers on that island. Don't forget that this is a distribution of belief and is not a true probability distribution since there is an exact number of tigers on that island.

In this example, we had to adjust our range of tested values of *q* in light of the posterior distribution. It is quite common to review the set of tested values of *q*, either expanding the prior's range or modeling some part of the prior's range in more detail when the posterior distribution is concentrated around a small range. It is entirely appropriate to expand the range of the prior as long as we would have been happy to have extended our prior to the new range before seeing the data. However, it would not be appropriate if we had a much more informed prior belief that gave an absolute range for the uncertain parameter that we are now considering stepping outside of. This would not be right because we would be revising our prior belief in light of the data: putting the cart before the horse, if you like. However, if the likelihood function is concentrated very much at one end of the range of the prior, it may well be worth reviewing whether the prior distribution or the likelihood function are appropriate, since the analysis could be suggesting that the true value of the parameter lies outside the preconceived range of the prior.

Continuing with our tigers on an island, let us imagine that the warden is unsatisfied with the level of uncertainty that remains about the number of tigers which, from 50 to 250, is rather large. She decides to wait a short while and then capture another 30 tigers. The experiment is completed and this time *t* tagged tigers are captured. Assuming that a tagged tiger still has the same probability of being captured as an untagged tiger, what is her uncertainty distribution now for the number of tigers on the island?

This is simply a replication of the first problem, except that we no longer use a discrete uniform distribution as her prior. Instead, the distribution plotted above represents the state of her knowledge prior to doing this second experiment and the likelihood function is now given by the Excel function HYPGEOMDIST(t, 30, 20, *q*). The six panels below show what the warden's posterior distribution would have been if the second experiment had trapped t = 1, 3, 5, 7, 10 and 15 tagged tigers instead. These posteriors (in black) are plotted together with the prior (in blue) and the likelihood functions (in red), all normalized to sum to 1 for ease of comparison.

You might initially imagine that performing another experiment would make you more confident about the actual number of tigers on the island, but the graphs show that this is not necessarily so. The posterior distributions for the two panels below are now more spread than the prior because the data contradicts the prior (the prior and likelihood peak at very different values of *q*).

In the case of 5 tigers, the data disagree moderately with the prior but the extra information in the data compensates for this, leaving us with about the same level of uncertainty but with a posterior distribution that is to the right of the prior.

The right panel below (example with 7 tigers) represents the scenario where the second experiment has the same results as the first. You'll see that the prior and likelihood overlay on each other because the prior of the first experiment was uniform and therefore the posterior shape was only influenced by the likelihood function. Since both experiments produced the same result, our confidence is improved and remains centered around the best guess of 85:

In the last panels below, the likelihood functions disagree with the priors, yet the posterior distributions have a narrower uncertainty. This is because the likelihood function is placing emphasis on the left tail of the possible range of values for *q*, which is bounded at *q* = 43:

In summary, these six panels show that the amount of *information* contained in data are dependent on two things: (1) the manner in which the data were collected (i.e. the level of randomness inherent in the collection), which is described by the likelihood function, and (2) the state of our knowledge prior to observing the data and the degree to which it compares with the likelihood function. If the data tell us what we are already fairly sure of, there is little information contained in the data for us (though the data would contain much more information for those more ignorant of the parameter). On the other hand, if the data contradict what we already know, our uncertainty may either reduce or increase depending on the circumstances. Thus, you could consider that the amount of information in a data set can be measured by the degree to which our opinion changes. Alternatively, taking a more decision-focused view, there is only information in data if it changes what we would chose to do in managing the risk issue.