To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree

 

 

We'll use the same set of data as the non-parametric Bootstrap example, which also provides us with an opportunity to compare the results. Imagine that we have a set of n random measurements of the height of 100 blades of grass from a lawn and we wish to estimate the true mean height of all blades of grass from that lawn. We put these 100 values into a spreadsheet and label the array "Data'.

 

We need to go through three stages:

 

1. Estimate the distribution from the data

For the parametric Bootstrap, we need to specify the distribution type we believe these values to have come from. Clearly, grass blades always have a positive length. One could argue that the length of a blade is the result f the product of a number of random variables (rainfall, genes, length of life, soil quality, number of immediate neighboring competitors for nutrition, etc.). A Lognormal distribution would be a good choice of distribution type in this situation, because of Central Limit Theorem.

 

2. Simulate the data collection

We perform a Monte Carlo experiment to replicate the process via which we acquired the data set. Each of the 100 values is replaced by a Crystal Ball Lognormal(18.24, 12.37) distribution.

 

3. Calculate the sample statistic

A large number of iterations are run, each one generating a new Bootstrap replicate. For each iteration, we calculate the statistic of interest. We'll calculate the mean, standard deviation, and the difference between the 10th and 90th percentiles again to compare with the non-parametric Bootstrap method. The Parametric Bootstrap Model goes through the calculations. The resultant distributions are our uncertainty about the population statistics:

 

Note that the parametric Bootstrap gives a higher estimate of the mean than the non-parametric Bootstrap. This is because the Lognormal distribution was fitted by MLE, not by matching moments, so the Lognormal mean (=18.24) does not have to be the same as the data mean (17.833). The non-parametric Bootstrap, which uses the data directly, will have a mean estimate equal to the data mean.

 

The parametric Bootstrap estimate of standard deviation is both greater and significantly more uncertain than the non-parametric Bootstrap estimate. It is greater because the fitted Lognormal distribution had a standard deviation (= 12.37) greater than the data (= 9.86). It is wider because the Lognormal distribution was not actually that good a fit to the data (so our reasoning for using the Lognormal may well be wrong), so the data and statistical model are disagreeing, which increases the level of uncertainty.

 

The degree of difference between the non-parametric and parametric statistical estimates in this example, combined with the lack of fit of the assumed Lognormal distribution, should make us place more confidence on the non-parametric estimates.

 

The links to the Parametric Bootstrap software specific models are provided here:

  Parametric_Bootstrap_Model

 

Using Crystal Ball's distribution fitting feature, we get the following fit:

 

Graph of mean estimate:

 

 

Graph of standard deviation estimate:


 

Graph of standard deviation: percentile range correlation:

 

The scatter plot above of the generated statistical estimates are correlated for both non-parametric and parametric Bootstrap estimates, but with very different patterns as one would expect from the difference in their estimates.

  Parametric_Bootstrap_Model

 

Using @Risk's distribution fitting feature, we get the following fit:

 

 

Graph of mean estimate:

 

Graph of standard deviation estimate:


 

Graph of standard deviation: percentile range correlation:


 

 

The scatter plot above of the generated statistical estimates are correlated for both non-parametric and parametric Bootstrap estimates, but with very different patterns as one would expect from the difference in their estimates.

 


 

 

 


  • No labels