To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree



There are two types of observations for which we can apply linear least squares regression:


     A.We are making random observations of X and Y together

     B.We are testing at different specific values of X to determine the response in Y


For type A, both X and Y are Bootstrapped together in pairs and calculate the regression coefficients with each Bootstrap replicate. Three versions of this Bootstrap model are provided, which use the INDEX, OFFSET and VLOOKUP Excel functions to select the pairs from the dataset.

The links to the Regression Bootstrap X Y Random software specific models are provided here:


For type B, we need to retain the selected nature of the X variable. The random element is now just the response of the Y variable to the X value and it is this random component that should be Bootstrapped: the X values are fixed since they were predetermined rather than resulting from a random sample from a distribution. Assuming that the random variations about the regression line are homoscedastic and that the straight line relationship is correct, the only random variable involved is that producing the variations about the line and so we Bootstrap the residuals about that line. If we know the residuals are Normally distributed, we can use the parametric Bootstrap model, as follows:


1.  Determine Syx - the standard deviation of the residuals about the least-squares regression line for the original data set.

2.     For each of the x-values in the data set, randomly sample from a Normal( \widehat{y}, S_{yx}) where  \widehat{y}=\widehat{m}x+\widehat{c}and \widehat{m},\widehat{c} are the least squared regression coefficients for the original data set.

3.     Determine the least squares regression coefficients for this Bootstrap sample.

4.     Repeat for B iterations.


Although parametric procedures work quite well, we are using all the assumptions of the linear regression parametric relationship between X ad Y model, for which classical statistics has calculated the uncertainty distributions, so it would be better to use the classical statistics formulae that offers exact answers under these conditions.





  • No labels