To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree

 

Dirichlet({ai})

Dirichlet equations

 

 

The Dirichlet distribution is a multivariate distribution whose components all take values on (0,1) and which sum to one.

 

Uses

The Dirichlet distribution is frequently used to describe uncertainty about the probabilities of a Multinomial distribution. As such, the Dirichlet distribution is to the Multinomial distribution what the Beta distribution is to the Binomial distribution.

 

Example

Imagine that we have taken a random survey of 616 people the week before an election from a large population (so that we can assume each sampled person is independent of the others). We ask them for their political allegiance, with the following results:

 

Socialist (S): 137

Liberal (L): 166

Conservative (C): 92

Green (G): 133

Don't know (D): 88

 

Assuming that people don't change their minds in the next week, that the "don't knows' don't care and don't vote, and that the survey was truly random, we can estimate who will win the election.

 

The distribution of uncertainty about the fraction of the population with each political affiliation is described as follows:

{p(S), p(L), p(C), p(G), p(D)} = Dirichlet({138, 167, 93, 134, 89})

 

Note that the parameters for the Dirichlet distribution are the number of observations in each class +1, in much the same way as the Beta distribution for modeling a binomial probability. The curly brackets { } denote that the inputs and outputs are arrays, i.e. that their numbers must be taken as a set.

 

Generation

 

The Dirichlet distribution can easily be constructed in two separate ways. The first method is similar in principle to the construction of a Multinomial distribution from a Binomial distribution and uses nested Beta distributions.

The links to the Multinomial software specific models are provided here:



The second method comes from the identity:

 

Beta(a, b, 1) = X1/(X1+X2)

 

where X1 = Gamma(0, b, a1), X2 = Gamma(0, b, a2), and because b is a scaling parameter for a Gamma distribution it can be any positive value. Thus, we can create a Dirichlet distribution by generating a separate Gamma distribution Gamma(0, 1, si+1) for each class i where we have observed si 'successes' and then divide each of these Gamma distributions by the sum of all the Gamma distributions.

 

The models below demonstrate both generating methods:

 

 

 


  • No labels