To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree


The table below gives an overview of the various continuous distributions described in ModelAssist, so that you can most easily focus on which ones might be most appropriate for your modeling needs. Follow the links for an in-depth explanation of each. We have used the most common name for each distribution.



Distribution

Example use

Beta

Models uncertainty or variation of a probability, fraction or prevalence.

Bradford

This unusual distribution is handy to help you know when to stop looking for scientific articles. Similar to a right-truncated Pareto

Cauchy

Models the points of impact of a fixed straight line of particles emitted from a point source. Ratio of two Normal distributions.

Chi Squared

The sum of unit Normal distributions squared. Used widely in classical statistics where sample measures can be transformed to be approximately a sum of unit Normal distributions squared too.

Cumulative ascending

Used to create an empirically-based distribution. Useful in creating a non-parametric fitted to data.

Cumulative descending

Another form of the cumulative distribution. Uses the probability of being greater than or equal to their corresponding x-values.

Dirichlet

Used to describe uncertainty about the probabilities of a Multinomial distribution: a multi-dimensional version of a Beta distribution.

Error

Another format for the Normal distribution with a zero mean.

Erlang 

A special case of the Gamma distribution where the first parameter is discrete.

Exponential

Models the time until an event occurs in a Poisson process.

Extreme value

Models the distribution of the extreme values that a variable can take.

F Distibution

Used in statistics to compare the variance between two (assumed Normally distributed) populations.

Frechet

Models the distribution of the extreme values that a variable can take.

Gamma

Models the time until a number of events occurs in a Poisson process.

General

Used to create an empirically-based distribution from relative frequency data.

Histogram

Useful for replicating the distribution shape of a set of data.

Inverse Gaussian

Models the time to cover a distance in Brownian motion.

JohnsonB

B for bounded, it is one of Johnson's family of distributions, handy to model expert opinion

JohnsonU

U for unbounded, it is one of Johnson's family of distributions. 

Laplace

A symmetric distribution, useful for having longer tails than a Normal distribution.

LogLaplace

An asymmetric distribution which offers a greater variety of shapes.

Logistic

Popular in demographic and economic modeling, mostly as a growth equation. Similar to a Normal distribution, but more peaked.

LogLogistic 

The log of the logistic distribution, so if X is loglogistically distributed, log X is logistically distributed.

Lognormal (format 1)

Useful for modeling naturally occurring variables that are the product of a number of other naturally occurring variables. If log X is Normally distributed, then X is lognormally distributed.

Lognormal (format 2)

An alternative way of defining a Lognormally distributed variable, using the mean and standard deviation of the corresponding Normal.

Normal  (Gaussian)

Models variations of naturally occurring variables, particularly of additive processes. Also an approximate distribution to many other distributions in certain circumstances.

Pareto (1st kind)

Used to model any variable that has a minimum, and also its most likely, value and for which the probability density decreases geometrically towards zero. Often used because it has a very long right tail.

Pareto (2nd kind)

Is a shifted Pareto distribution.

Pearson V

A member of the Pearson system of distributions, and little used.

Pearson VI

Another member of the Pearson system of distributions.

PERT

A smoothed, triangular-like distribution, based on the Beta distribution.

Rayleigh 

A special case of the Weibull distribution. Models distance to nearest neighbor where they are Poisson distributed in space.

Student

Used in statistical estimation.

Uniform

Used as a very approximate model where there are very little or no available data.

Triangular

Used as a rough modeling tool where the range and the most likely value within the range can be estimated.

Weibull

Used to model the time until occurrence of an event where the momentary probability of occurrence changes over time.


A continuous distribution is used to represent a variable that can take any value within a defined range (domain). For example, the height of an adult English male picked at random will have a continuous distribution because the height of a person is essentially infinitely divisible. We could measure his height to the nearest centimeter, millimeter, tenth of a millimeter, etc. The scale can be repeatedly divided up generating more and more possible values.


Properties like time, mass and distance, that are infinitely divisible, are modeled using continuous distributions. In practice, we also use continuous distributions to model variables that are, in truth, discrete but where the gap between allowable values is insignificant: for example, project cost (which is discrete with steps of one penny, one cent, etc.), exchange rate (which is only quoted to a few significant figures), number of employees in a large organization, etc.


The vertical scale of a relative frequency plot of an input continuous probability distribution is the probability density. It does not represent the actual probability of the corresponding x-axis value since that probability is zero. Instead, it represents the probability per x-axis unit of generating a value within a very small range around the x-axis value.


In a continuous relative frequency distribution, the area under the curve must equal one. This means that the vertical scale must change according to the units used for the horizontal scale. For example, the figure below shows a theoretical distribution of the cost of a project using Normal(£4 200 000, £350 000).


 


Since this is a continuous distribution, the cost of the project being precisely £4M is zero. The vertical scale reads a value of 9.7x10-7 (about one in a million). The x-axis units are £1, so this y-axis reading means that there is a one in a million chance that the project cost will be £4M plus or minus 50p (a range of £1). By comparison, the figure below shows the same distribution but using million pounds as the scale i.e. Normal(4.2, 0.35). The y-axis value at x = £4M is 0.97, one million times the above value.





This does not however mean that there is a 97% chance of being between £3.5M and £4.5M, because the probability density varies very considerably over that range. The logic used in interpreting the 9. 7x10-7 value for the first figure is an approximation that is valid there because the probability density is essentially constant over that range (£4M +/- 50p).


The links below discuss different ways of categorizing distributions that may help in your selection of the most appropriate distribution to use:


Bounded and unbounded distributions

Parametric and non-parametric distributions

Discrete distributions