The maximum entropy formalism (sometimes known as MaxEnt) is a statistical method for determining a distribution of maximum logical uncertainty about some parameter, consistent with a certain limited amount of information. For a discrete variable, MaxEnt determines the distribution that maximises the function H(x) where:
{H(x)=-\displaystyle\sum_{i=1}^{M} p_i \,{In}[p_i]} |
and where pi is the confidence for each of the M possible value xi of the variable x. H(x) takes the equation of a statistical mechanics property known as entropy, which gives the principle its name. For a continuous variable, H(x) takes an integral function:
{H(x)=-\int\limits_{min}^{max} f(x){In}[f(x)] dx} |
The appropriate uncertainty distribution is determined by the method of Lagrange multipliers and, in practice, the continuous variable equation for H(x) is replaced by its discrete counterpart. It is beyond the scope of this guide to look too deeply into the mathematics, but there are a number of results that are of general interest. MaxEnt is often used to determine appropriate priors in a Bayesian analysis, so the results listed below give some reassurance to prior distributions we might wish to use to conservatively represent our prior knowledge.
State of knowledge | MaxEnt distribution |
---|---|
Discrete parameter, n possible values {x_{i}} | Duniform({x_{i}}), i.e. p(x_{i}) = 1/n |
Continuous parameter, minimum and maximum | Uniform(min, max), i.e. f(x) = 1/(max-min) |
Continuous parameter, known mean m and variance s2 | Normal(m, s) |
Continuous parameter, known mean m | Exponential(m) |
Discrete parameter, known mean m | Poisson(m) |
The Normal distribution result is interesting and provides some justification for the common use of the Normal distribution when all we know is the mean and variance (standard deviation), since it represents the most reasonably conservative estimate of the parameter given that set of knowledge. The Uniform distribution result is also very encouraging when estimating a binomial probability, for example. The use of a Beta(s+a, n-s+b, 1) to represent the uncertainty about the binomial probability p when we have observed s successes in n trials assumes a Beta(a, b, 1) prior. A Beta(1, 1, 1) is a Uniform(0, 1) distribution thus our most honest estimate of p is given by Beta(s+1, n-s+1,1).
The reader is recommended Sivia (1996) for a very readable explanation of the principle of MaxEnt and derivation of some of its results. Gzyl (1995) provides a far more advanced treatise on the subject, but requires a much higher level of mathematical understanding.