Before we use data in a risk analysis model, we need to evaluate its quality. If the data is indeed relevant to the process we are trying to model, we then need to find a distribution that best fits it, and the level of uncertainty in the distribution parameters (or even in the distribution chosen).
The determination (fitting) of distributions of randomness or inter-individual variability from data is discussed in considerable detail, and covers parametric and non-parametric distributions, first and second order distribution fitting, and Bayesian, classical and Bootstrap methods.
In the example below, we fit a variability distribution (red line), to observational data in yellow. This assumes that both the distribution and the parameters are know, in this case a Normal distribution with certain mean µ and standard deviation σ.
Fitting distribution of variability in data
The estimation of model parameters is the domain of statistics, of which there are two main approaches: Bayesian and classical (AKA "frequentist"). At EpiX Analytics we choose an approach based on how closely they fit our problem and how easy they are to implement. This section offers an in-depth description of both methods, together with the Bootstrap – a classical statistics technique that can relax some of the restrictive assumptions of classical methods, and a comparison between the results of each method, which turn out to be very similar in most situations.
The figure below shows the same data, but we now also include uncertainty in the µ and σ parameters of the Normal variability distribution, which is represented by the multiple gray lines. The red line still shows our best fit, usually determined using Maximum Likelihood Estimation methods
Fitting distribution of variability in data and parameter uncertainty
Sometimes we also want to separate the effect of uncertainty and variability in our models, which is described in the two-dimensional modeling section.