The non-parametric Bootstrap

The non-parametric Bootstrap is used to estimate a parameter or parameters of a population or probability distribution from a set of observations {xi} where we don't wish to make a guess of the distributional form (e.g. Normal, Gamma, lognormal). The non-parametric Bootstrap has three stages:

1. Estimate the population (or probability) distribution from the data set;

2. Simulate the sampling from the population distribution that led to the set of observations {xi};

3. For each sampling, calculate the sample statistic of interest.

1. Estimate the distribution from the data

For the non-parametric Bootstrap, we simply use the frequency distribution of the n data values as our best guess of the population or probability) distribution. In other words the list of values {xi} is assumed to be our population distribution. Clearly this will be an increasingly poor estimation the fewer the observations.

2. Simulate the data collection

We now perform a Monte Carlo experiment to replicate the process via which we acquired the data set. In other words, we take a sample at random from the estimated population distribution. For the non-parametric Bootstrap this equates to randomly picking any of {xi} as a possible value for each of the data points we actually observed. A simulated set of n observations is called a Bootstrap replicate.

3. Calculate the sample statistic

We now run a large number of iterations, each one generating a new Bootstrap replicate, and for each Bootstrap replicate we calculate the sample estimate of the statistic in question.  If we are interested in the standard deviation, for example, of the population, we simply calculate the sample standard deviation (STDEV in Excel) of the Bootstrap replicates.

In summary, the non-parametric Bootstrap proceeds as follows:

• Collect the data set of n samples {x1, …xn}

• Create B Bootstrap samples {x1*, …xn*} where each xi* is a random sample with replacement from {x1, …xn}

• For each Bootstrap replicate {x1*, …xn*} calculate the required statistic $//$. The distribution of these B estimates of q represents the Bootstrap estimate of uncertainty about the true value of q.

Example:

Estimation population mean, standard deviation and percentile range for a continuous variable

• No labels