To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Minor text edits


You have a set of random and representative observations of a single model variable, for example the number of children in American families (we'll look at a joint distribution for two or more variables at the end of this section), and you have enough observations to feel that the range and approximate random pattern has been captured. You want to use the data to construct a distribution directly.


It is unnecessary to fit a distribution to the data: instead Since the data already captures the pattern, one can simply use the empirical distribution of the data (if there are no physical or biological reasons a certain distribution should be used, we generally prefer an empirical distribution)rather than fitting it to a parametric distribution. The main thing to keep on mind when using an empirical distribution is that extrapolating beyond the observed data can be difficult and subjective. Below, we outline three options you have to use this data to construct an empirical distribution:


2. Cumulative: creates a cumulative distribution, and therefore allows values between those observed, and possibly values beyond the observed range;

3. Histogram: when you have huge amounts of datasimilar to a cumulative distribution, but can be more efficient with large datasets.

Option 1: A Discrete Uniform distribution