To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In general, for a population size M of which D have the characteristic of interest, in selecting a sample of size n from that population at random without replacement, the probability of observing x in with the characteristic of interest is given by:

 

LaTeX Math Inline
bodyp(x)=\frac{\left( \begin{array}{c} D \\ x \end{array} \right) \left( \begin{array}{c} M-D \\ n-x \end{array} \right)}{\left( \begin{array}{c} M \\ n \end{array} \right) }

LaTeX Math Inline
body0\leq x\leq n
,

LaTeX Math Inline
bodyx \leq D
,

LaTeX Math Inline
bodyn \leq M

...

which is the probability mass function of the Hypergeometric distribution Hypergeometric(D/M,n,M). If you are curious, the Hypergeometric distribution gets its name because its probabilities are successive terms in a Gaussian hypergeometric series.

...

If we replaced each item one at a time back into the population when taking our sample n, the probability of each individual item having the characteristic of interest is D/M and the number of times we sampled from D is then given by a Binomial(D/M,n). More usefully, if M is very large compared to n, the chance of picking the same item more than once if one was to replace the item after each selection would be very small. Thus, for large M (usually n<0.1M is quoted as being a satisfactory condition), there will be little difference in our sampling result whether we sample with or without replacement, and we can approximate a Hypergeometric(D/M,n,M) with a Binomial(D/M,n), which is much easier to calculate. This is explained in more detail in the section binomial approximation to the hypergeometric.

 

Multivariate Hypergeometric distribution

...