`
Imagine that we have a set of n random measurements of the height of 100 blades of grass from a lawn and we wish to estimate the true mean height of all blades of grass from that lawn. We put these 100 values into a spreadsheet and label the array "Data'.
We need to go through three stages:
1. Estimate the distribution from the data
For the non-parametric Bootstrap, we simply use the frequency distribution of the n data values as our estimate.
In relative frequency form, a continuous variable would simply be a list of values each with probability 1/n. The data set plot however shows three peaks with probability 2/n because the measuring was done to 1/10mm and therefore three values occurred twice. A finer level of measurement would make each value individual.
2. Simulate the data collection
We now perform a Monte Carlo experiment to replicate the process via which we acquired the data set. Each of the 100 values is replaced by a Discrete Uniform distribution.
3. Calculate the sample statistic
We now run a large number of iterations, each one generating a new Bootstrap replicate and, for each iteration, we calculate the statistic of interest. In this case, we'll calculate the mean, standard deviation, and the difference between the 10th and 90th percentiles. Non-Parametric Bootstrap Model goes through the calculations. The resultant distributions are our uncertainty about the population statistics:
A very important advantage that the Bootstrap has over other statistical methods is the ability to analyze the correlation between uncertainty distributions. For example, the scatter plots below of the generated statistical estimates are correlated. It is therefore very convenient for us that the Bootstrap automatically creates the necessary correlations structure.
The links to the Non-Parametric Bootstrap software specific models are provided here: