# Common error 3:

At school we learn things like:

1 + 3 = 4 so 4 – 1 = 3

3 * 2 = 6 so 6 / 3 = 2

Later, when we take algebra, we learn:

A + B = C therefore C – A = B

D*E = F therefore F / D = E

The problem is that these trusted rules do *not* apply so universally when manipulating random variables. This section explains how and when these simple algebraic rules no longer work, shows you how to identify them in your model, and how to make the appropriate corrections.

*Example 1*

Most deterministic models consist of linked formula that contain nothing more complicated than simple operations like +, -, * and /. When we decide to start adding uncertainty to the values of the components in the model, it seems natural enough to simply replace a fixed value by a probability distribution describing our uncertainty. Here we have an example of a credit servicing company that can be found in the model Fixed and Risked.

The best guess estimates of the money borrowed by a client and for the number of clients have been replaced by distributions, but the model is otherwise unchanged. This model is probably very wrong. Why? Well, the error is most easily seen by watching random values being generated on-screen. Look at the values that are being used for the entire client base and compare with where these values sit on the Lognorm(10000,4000) distribution.

For example, the Lognorm(10000,4000) distribution has 10% of its probability below £5,670. Thus, in 10% of it's iterations it will generate a value below this figure and that value will be used for **all** customers. The Lognorm distribution undoubtedly reflects the variability that is expected between customers (perhaps, for example, it was fit to a relevant data set of amounts individual customers have previously borrowed). The probability that two randomly selected customers will borrow less that £5,670 is 10% * 10% = 1% (0.1 * 0.1 = 0.01 if you prefer). The probability that all (say) 6500 customers borrow less than £5,670 if the amounts they borrow are independent is 10-6500 i.e. effectively impossible, yet our model gives it a 10% probability.

In order to model this problem correctly we need to consider what are the sources of uncertainty about the amount a customer borrowed. If the source is specific to each individual client, then the amounts can be considered independent, and the techniques offered here provide a number of solutions. If there is some systematic influence (like the state of the economy, recent bad press for companies offering credit, etc.) it will have to be separated out from the individual, independent component.

The links to the Fixed and Risked software specific models are provided here:

*Example 2*

The sum of two independent Uniform(0,1) distributions is … what do you think?

The answer often surprises people. It is hard to imagine a simpler problem, yet when we canvass a class we get quite a range of answers. Perhaps a Uniform(0,2)? That's the most common response. Or something looking a little Normal? The answer is a Triangular(0,1,2), so we could write:

U(0,1) + U(0,1) = T(0,1,2)

The first message in this example is that it is difficult for a person not very well versed in risk analysis modeling to be able to predict well the results of even the most trivial model. Of course, that makes it very hard to check the model and be comfortable about its results.

On to the next question we often pose our class:

T(0,1,2) – U(0,1) = ?

Wise to the trickiness of the question, most class participants are pretty sure that their first guess (i.e. = U(0,1) ) is wrong but don't have anything else to suggest. The answer is a symmetric distribution that looks a little Normal, stretching from -1 to 2 with peak at 0.5. But why isn't it U(0,1)? An easy way to visualize this is to run a simulation adding two Uniform(0,1) distributions and plotting the generated values from one Uniform distribution together with the calculated sum of them both. You get a scatter plot that looks like this:

Note the red line y=x shows the lowest value for the Triangular distribution for any given value of the Uniform distribution, and the green line y = 1+x is the highest value, which makes intuitive sense. The vertical spread between the two lines is the effect of the second Uniform(0,1) distribution. Also note that all the generated values lie uniformly (but randomly) between these two lines. This actually is quite helpful in visualizing why the sum of two Uniform(0,1) distributions is a Triangular(0,1,2) by projecting all the dots onto the y-axis. If you extend this graph to work out U(0,1) + U(0,3), the you would see the below plot:

Note that the scatter-plot has equal density between the red (y = x) and green (y = x+1) lines. In the y-range between 0 and 1 the scatter-point area is increasing linearly, then between y = 1 and y = 3 it is constant, and from y = 3 to y = 4 the scatter-point area decreases linearly to zero. Thus the sum takes a trapezoid shape which in Crystal Ball could be created with a General distribution, constructed with Crystal Ball's Custom Distribution:

The point of the graph is to show you that there is a strong dependency pattern between these two distributions (a Uniform and the Triangular sum), which would need to be taken into account if one wished to extract back out the two Uniform distributions from each other. For example, the formula below do just that:

A: =Triangular(0,1,2)

B: =Uniform(IF(A<1,0,A-1),IF(A>1,1,A))

C: =A - B

Try to follow the logic for the formula for B from the graph. B will generate a Uniform(0,1) distribution with the right dependency relationship with A to leave C a Uniform(0,1) distribution too.

To recap, the problem is that we have three variables linked together as follows:

A + B = C

We know the distributions for A and C, how do we find B and how do we simulate A, B and C all together? The simple example above using two Uniform distributions allows us to simulate A, B and C all together, but only because we assumed A and B were independent, and the problem was very simple. In general, we cannot correctly determine B, so we need either to construct a model that avoids having to perform such a calculation, or admit that we have insufficient information to specify B.