We sometimes need to recognize the interrelationship between probabilities of values for two or more distributions.
...
In some simulation software packages, specific features, such as Rank Order Correlation in Crystal Ball, and other modeling methods allow us to crudely model correlations between several distributions. However, there are certain situations where specific multinomial distributions are needed.
The following three common multivariate distributions are described here:
Multinomial
Dirichlet
and Multivariate Hypergeometric
Multinomial distribution
For a set of n trials, each of which could take one of k different outcomes (like different colours of balls in a huge urn) with probabilities p_{1}..p_{k}, the distribution of the outcomes is known as multinomial, which is just an extension of the binomial distribution. The only difference is the number of possible outcomes: only two for the binomial and multiple for the multinomial. The Multinomial distribution has the following probability mass function:
...
LaTeX Math Block  

 
where \; \displaystyle\sum_{i=1}^{k} s_i = n \quad and \quad \displaystyle\sum_{i1}^{k}p_i =1 
LaTeX Math Inline  


Let's consider the following problem: The cars in a city are divided into 9 different categories. We know the proportion of the city's cars that are in each category. If we were to monitor 1000 cars that enter a particular motorway, how many cars of each category would we see?
This is clearly a problem of multinomial trials since every car that enters the motorway can be any one of 9 types.
...
Note that the marginal distribution (the distribution of the parameter looked at in isolation, i.e. with the uncertainty of all the other parameters integrated out) for s_{j} (i.e. the distribution of generated values for sj when looked at by itself) is simply a binomial(n,p_{j}).
An example of a multinomial distribution can be found in the Multinomial model.
...
Expand  

 
Here is a screenshot of the model:
Our first category is simulated in cell C14, which is just a binomial distribution : Binomial(5%, 1000) As the second category now needs to take into account the result from the first type, the formula in cell D14 becomes as shown above  number of trials is decreased by the number of successes from the first category (see D13), and the probability of success becomes the probability of category two divided by the sum of the probabilities of the remaining 8 categories (see D12). This logic is consistent throughout the "Successes" row (cells D15: K15). 
...
Expand  

 
Here is a screenshot of the model:
Our first category is simulated in cell C11, which is just a binomial distribution : RiskBinomial(1000, 5%) As the second category now needs to take into account the result from the first type, the formula in cell D11 becomes as shown above  number of trials is decreased by the number of successes from the first category, and the probability of success becomes the probability of category two divided by the sum of the probabilities of the remaining 8 categories. This logic is consistent throughout the "Successes" row (cells D11: K11), and the row "Outputs" shows a nice way of naming the output cells. 
Dirichlet distribution
The conjugate to the multinomial distribution is the Dirichlet distribution, much like the beta distribution is the conjugate to the binomial distribution. The Dirichlet distribution is used for modeling the uncertainty around probabilities of successes in multinomial trials.
...
LaTeX Math Block  

 
f(p_1,p_2,...p_k)=\frac{\Gamma\big(\alpha_1+a_2+...+a_k\big)}{\Gamma\big(a_1\big) \Gamma\big(a_2\big) ...\Gamma \big(a_k\big)} p_1^{\;\alpha_{1^{1}}}...p_k^{\;a_{k^{1}}} 
...
For example, if you've observed s_{1}, s_{2}, … s_{k} of different types of outcomes from n trials, the Dirichlet distribution provides the confidence distribution about the correct values for the probability that a random trial will produce each type of outcome by setting a_{1} = s_{1}+1,. Obviously these probabilities have to sum to 1, so their uncertainty distributions are interrelated.
Let's take the same problem that we used in the previous example: All cars in a city are divided into 9 different types. But now we have monitored 100 cars that were entering a particular motorway, and counted the number of cars of each type. What is the uncertainty distribution for the proportions of each type in the total population of cars?
In order to solve this problem we need to go through the following steps:
Simulate p_{1} = Beta(a_{1}, SUM(a_{2}..a_{k}),1)
For each remaining probability (except pk), simulate p_{2}, p_{3}, …,, p_{k1} in order with
pj = (1SUM(p_{1}..p_{j1}))*Beta(a_{j}, SUM(a_{j+1}..a_{k}),1)Finally set p_{k} = 1SUM(p_{1}..p_{k1})
...
Expand  

 
Putting the above logic into a spreadsheet model looks like this:
Row "Alphas" (Cells C10:K10) is calculating the values for aj , which are just sj+1 Cell C12 is calculating the confidence distribution for the proportion of category 1 cars using Crystal Ball's Beta distribution with the first parameter equal to a1 and the second parameter equal to the summation of a2 to a9. The second part of the equation in cells D13 to J13 follow similar logic as C13, which is then multiplied by the (1  sum of the previous cells in the same row). The last Cell K13 calculates the implied probability for the last category as 1sum(C13:J13). 
...
Expand  

 
Putting the above logic into a spreadsheet model looks like this:
Row "Alphas" (Cells C9:K9) is calculating the values for aj , which are just sj+1 Cell C10 is calculating the confidence distribution for the proportion of category 1 cars using a RiskBeta function with the first parameter equal to a1 and the second parameter equal to the summation of a2 to a9. The second part of the equation in cells D10 to J10 follow similar logic as C10, which is then multiplied by the (1  sum of the previous cells in the same row). The last Cell K10 calculates the implied probability for the last category as 1sum(C10:J10). 
The Dirichlet distribution is not as intuitive as the Multinomial distribution, but it is a very handy tool when modeling multinomial trials.
Multivariate hypergeometric
Sometimes we need to model sampling from a population without replacement with multiple outcomes and when the population is small so the process cannot be approximated to a multinomial where the probabilities of success remain constant. In this case we use the multivariate hypergeometric distribution, which is similar to the hypergeometric distribution, with the difference in the number of possible outcomes from a trial (two  in the hypergeometric and many  in the multivariate hypergeometric).
The figure below shows the graphical representation of the multivariate hypergeometric process. D_{1}, D_{2}, D_{3} and so on are the number of individuals of different types in a population, and x_{1}, x_{2}, x_{3}, ... are the number of successes (the number of individuals in our random sample (circled) belonging to each category).
The Multivariate hypergeometric distribution has the following probability mass function:
LaTeX Math Block  

 
f(x)=\frac {\left( \begin{array}{c} D_1 \\ x_1 \end{array} \right)\left( \begin{array}{c} D_2 \\ x_2 \end{array} \right)...\left( \begin{array}{c} D_k \\ x_k \end{array} \right)}{\left( \begin{array}{c} M \\ n \end{array} \right)} 
, where
LaTeX Math Inline  


Let's imagine a problem where we have 100 colored balls in a bag, from which 10 are red, 15 purple, 20 blue, 25 green and 30 yellow. Without looking into the bag, you take 30 balls out. How many balls of each color will you take from the bag?
We cannot model this problem using the multivariate distribution, because when we take the first ball out, the proportions of the different color balls in the bag change. The same happens when we take the second ball out and so on.
Thus, we must proceed as follows:
Model the first color (red for example) as x_{1}= Hypergeometric(s_{1} ,N, M) , where s_{1} is the the type 1 population = 10, N is the sample size = 30, and M is the population size = 100
Model the rest as: x_{i} = Hypergeometric (s_{i}, N  SUM(x_{1}: x_{i1}), SUM(D_{i} : D_{n})) , where s_{i} is the the type i population, x_{i} is the number of successes of the type i in a sample, x_{i1} is the number of successes of the type i1 in a sample, D_{i} is the number of successes of type i in the total population, D_{n} is the number of successes of the last type in the total population.
The solution to this problem can be reached here: Multivariate Hypergeometric
...
Expand  

 
...