We'll use the same set of data as the non-parametric Bootstrap example, which also provides us with an opportunity to compare the results. Imagine that we have a set of n random measurements of the height of 100 blades of grass from a lawn and we wish to estimate the true mean height of all blades of grass from that lawn. We put these 100 values into a spreadsheet and label the array "Data'.
We need to go through three stages:
1. Estimate the distribution from the data
For the parametric Bootstrap, we need to specify the distribution type we believe these values to have come from. Clearly, grass blades always have a positive length. One could argue that the length of a blade is the result f the product of a number of random variables (rainfall, genes, length of life, soil quality, number of immediate neighboring competitors for nutrition, etc.). A Lognormal distribution would be a good choice of distribution type in this situation, because of Central Limit Theorem.
2. Simulate the data collection
We perform a Monte Carlo experiment to replicate the process via which we acquired the data set. Each of the 100 values is replaced by a Crystal Ball Lognormal(18.24, 12.37) distribution.
3. Calculate the sample statistic
A large number of iterations are run, each one generating a new Bootstrap replicate. For each iteration, we calculate the statistic of interest. We'll calculate the mean, standard deviation, and the difference between the 10th and 90th percentiles again to compare with the non-parametric Bootstrap method. The Parametric Bootstrap Model goes through the calculations. The resultant distributions are our uncertainty about the population statistics:
Note that the parametric Bootstrap gives a higher estimate of the mean than the non-parametric Bootstrap. This is because the Lognormal distribution was fitted by MLE, not by matching moments, so the Lognormal mean (=18.24) does not have to be the same as the data mean (17.833). The non-parametric Bootstrap, which uses the data directly, will have a mean estimate equal to the data mean.
The parametric Bootstrap estimate of standard deviation is both greater and significantly more uncertain than the non-parametric Bootstrap estimate. It is greater because the fitted Lognormal distribution had a standard deviation (= 12.37) greater than the data (= 9.86). It is wider because the Lognormal distribution was not actually that good a fit to the data (so our reasoning for using the Lognormal may well be wrong), so the data and statistical model are disagreeing, which increases the level of uncertainty.
The degree of difference between the non-parametric and parametric statistical estimates in this example, combined with the lack of fit of the assumed Lognormal distribution, should make us place more confidence on the non-parametric estimates.
The links to the Parametric Bootstrap software specific models are provided here: