To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree



Most common method

With a couple of exceptions below, there are no simple ways to model the sum of a set of correlated random variables. One simply has to model each random variable in its own spreadsheet Cell, using one of the correlation methods described elsewhere in this guide, and then sum them up in another cell.

Exception 1: All variables take the same distribution and are 100% correlated

In the situation where the source of the randomness or uncertainty of the distribution associated with a random variable is the same for the whole group you are adding up, then there is really just one random variable. For example, we imagine a railway network company that must purchase next year 127,000 sleepers. The sleepers will be made of wood, but the price is uncertain because the cost of timber may fluctuate. It is estimated that the cost will be $PERT(22.1,22.7,33.4) each. If all the timber is being purchased at the same time, it might be reasonable to believe that all the sleepers will have the same price. In that case, the total cost is simply: =127000*RiskPert(22.1,22.7,33.4).


Exception 2: There are a large number of variables, all taking the same distribution and the correlation can be described with a covariance statistic.


If there are a large number n of random variables Xi (i=1 to n) being summed, and as long as the uncertainty of the sum is not dominated by a few of these distributions, the sum is approximately Normally distributed according to the Central Limit Theorem, as follows:


\displaystyle\sum_{i-1}^{n} X_i = Normal \bigg(\displaystyle\sum_{i-1}^{n} \mu_i,\sqrt {\displaystyle\sum_{i-1}^{n} \displaystyle\sum_{j-1}^{n}\sigma_{ij}}\bigg)


The formula says that the Normal distribution has a mean equal to the sum of the means for the individual distributions being added together. It also says that the variance (the square of the standard deviation in the formula) of the Normal distribution is equal to the square of the covariance terms between each variable. The covariance terms sij are calculated as follows:

\sigma_{ij} = \rho_{ij}\sigma_i\sigma_j or   \sigma_{ij} = E \big\lfloor \big(x_i-\mu_i\big)\big(x_j-\mu_j \big)\big\rfloor


where  si , sj are the standard deviations of variables i and j, rij is the correlation coefficient, and E[·] means "the expected value of' the thing in the brackets. 

If we have data sets for the variables being modeled, EXCEL can calculate the covariance and correlation coefficients using the functions COVAR( ) and CORREL( ) respectively. If we were thinking of using a rank order correlation matrix, each element corresponds reasonably accurately to rij for roughly Normal distributions (at least, not very heavily skewed distributions), so the standard deviation of the Normally distributed sum could be calculated directly from the correlation matrix, as shown in model Covariance and correlation. This model actually uses some pretty skewed distributions and comes up with a good approximation to the sum one gets from running a simulation of each variable.


The links to the Covariance and correlation software specific models are provided here:


  • No labels