There is a very useful distinction to be made between "model-based" parametric and "empirical" non-parametric distributions. Parametric distributions are "Model-based" as they are distributions whose shape are borne of the mathematics describing a theoretical problem. For example, an exponential distribution is the direct result of assuming that the rate of decay of x is proportional to x, and a Lognormal distribution is derived from assuming that ln[x] is Normally distributed.
Non-parametric distributions are "empirical' as they are distributions with mathematics defined by the shape required by the modeler rather than by a theoretical principle. For example, a Triangular distribution is defined by its minimum, mode and maximum values; a Histogram distribution is defined by its range, the number of classes and the frequency of each class. The defining parameters for general distributions are features of the graph shape. Empirical distributions include: Cumulative, Discrete, Histogram, Relative, Triangular and Uniform.
Both distribution types have pros and cons, in brief:
- Non-parametric distributions are intuitively easy to understand, extremely flexible and are therefore very useful when sufficient data is available. However, as they only mimic the data used to build them, they are a poor choice when modeling extreme values or tail risks that are unlikely to be observed in the data. For example, if a risky event only has a .1% chance of occurring, one would have to collect a average of 1,000 data points to observe this event and been able to include it in an empirical distribution. So in general, we recommend using empirical distributions when:
- The data is representative enough of the parameter we are trying to model. Typically this requires having a large enough number of data points.
- A model is continuously updated with new data and the shape of the distribution may change (avoiding using a parametric fit that might require revisions each time the data is populated)
- Parametric distributions require a greater knowledge of the distribution's underlying assumptions to be used properly so the analyst may find it more difficult to justify their use and make alterations if more information becomes available. On the flip side, parametric distributions are well described in different areas, with plenty of empirical evidence for their application. This can facilitate peer acceptance and credibility for the choice of distribution. They are also more suitable to model extreme events, as - unlike empirical distributions - they allow for the extrapolation of tails outside of the observed data. We recommend using parametric distributions when:
- The mathematical theory underpinning the distribution applies to the particular problem. For example, the Normal distribution works well to model aggregated values because of the Central Limit Theorem - CLT, whereas the Weibull distribution is an excellent choice to model queuing or waiting time problems.
- There is enough empirical evidence and acceptance that the distribution works well for the variable being modeled. As an example, the Lognormal has been used for a long time to model the initial production of an oil well.
- The distribution approximately fits the expert opinion being modeled. Examples of this include use of the PERT, Exponential, Lognormal, Normal, and Pareto distributions.