A game warden on an island covered in jungle would like to know how many tigers she has on her island. It is a big island with dense jungle and she has a limited budget, so she can't search every inch of the island methodically. Besides, she wants to disturb the tigers and the other fauna as little as possible. She arranges for a capture-release-recapture survey to be carried out as follows:
Hidden traps are laid at random points on the island. The traps are furnished with transmitters that signal a catch and each captured tiger is retrieved immediately. When 20 tigers have been caught, the traps are removed. Each of these 20 tigers are carefully sedated and marked with an ear tag, then all are released together back to the positions they were originally caught. Some short time later, hidden traps are laid again, but at different points on the island until 30 tigers have been caught and the number of tagged tigers is recorded. Captured tigers are held in captivity until the 30th tiger has been caught.
The experiment results in 7 out of the 30 tigers captured in the second set of traps are tagged. How many tigers are there on the island? |
The warden has gone to some lengths to specify the experiment precisely. This is so that we will be able to assume within reasonable accuracy that the experiment is taking a hypergeometric sample from the tiger population. A hypergeometric sample assumes that an individual with the characteristic of interest (in this case, a tagged tiger) has the same probability of being sampled as any individual that does not have that characteristic (i.e. the untagged tigers). You might enjoy thinking through what assumptions are being made in this analysis and where the experimental design has attempted to minimize any deviation from a true hypergeometric sampling.
We will use the usual notation for a hypergeometric process:
n - the sample size, = 30,
D - the number of individuals in the population of interest (tagged tigers) = 20,
M - the population (the number of tigers in the jungle). In the Bayesian inference terminology, this is given the symbol q as it is the parameter we are attempting to estimate, and
s - the number of individuals in the sample that have the characteristic of interest = 7.
We could get a best guess for M by noting that the most likely scenario would be for us to see tagged tigers in the sample in the same proportion as they occur in the population. In other words:
\frac{s}{n} \approx \frac{D}{M} i.e. \frac{7}{30} \approx \frac{20}{M} which gives M » 85 to 86
but this does not take account of the uncertainty that occurs due to the random sampling involved in the experiment. Let us imagine that before the experiment was started the warden and her staff believed that the number of tigers was equally likely to be any one value as any other. In other words, they knew absolutely nothing about the number of tigers in the jungle and their prior distribution is thus a discrete uniform distribution over all non-negative integers. This is rather unlikely, of course, but we will discuss better prior distributions elsewhere.
The likelihood function is given by the probability mass function of the hypergeometric distribution, i.e.:
l(X|\theta)=\frac{\bigl(\begin{smallmatrix} D \\ s \end{smallmatrix} \bigr) \bigl(\begin{smallmatrix} M-D \\ n-s \end{smallmatrix} \bigr)}{\bigl(\begin{smallmatrix} M \\ n \end{smallmatrix} \bigr)} =\frac{\bigl(\begin{smallmatrix} 20 \\ 7 \end{smallmatrix} \bigr) \bigl(\begin{smallmatrix} \theta -20 \\ 23 \end{smallmatrix} \bigr)}{\bigl(\begin{smallmatrix} \theta \\ 30 \end{smallmatrix} \bigr)} \theta \geq 43
l(X|\theta)=0
otherwise
The likelihood function is zero for values of q below 43 since the experiment tells us that there must be at least 43 tigers: 20 that were tagged plus the (30-7) that were caught in the recapture part of the experiment and were not tagged.
The Tigers model performs the Bayesian estimate where a discrete uniform prior, with values of q running from 0 to 150 is multiplied by the likelihood function above to arrive at a posterior distribution.
The links to the Tigers software specific models are provided here: