# Modeling an extreme value for a variable

Imagine that we are building a bridge between two islands. The bridge must stand up to extreme weather events, like very high or powerful waves, and very high sustained winds or gusts. For example, it might be specified that the bridge must have a 90% probability of withstanding the highest sustained  (>10 minutes, for example) wind that might occur in the next one hundred years. Of course, we could be very unlucky: the highest wind of the century could occur tomorrow, and then with 10% probability it blows the bridge down! However, we can't build infinitely strong bridges and costs make us reach a specification compromise like the one above.

Since the wind speed at any moment is a continuous random variable, it follows that the greatest wind speed over the next century is also a continuous random variable. There are many such situations in which we wish to model not the entire range that a variable might take, but an extreme, either the minimum or maximum. For example, earthquake power impinging on a building – it must be designed to sustain the largest earthquakes with minimum damage within the bounds of the finances available to build it; maximum wave height for designing offshore platforms, breakwaters and dikes; pollution emissions for a factory to ensure that, at its maximum, it will fall below the legal limit; determining the strength of a chain, since it is equal to the strength of its weakest link; modeling the extremes of meteorological events since these cause the greatest impact. People have put a lot of effort into determining the distributions of these extremes for various situations, but it is often not easy. You can imagine that if, for example, we have only ten years of wind data, we will have to make some assumptions to estimate what the greatest wind speed of the century might be.

It is not just engineers that are interested in extreme values of a parameter (like minimum strength, maximum impinging force) because they are the values that determine whether a system will potentially fail. Insurance companies, for example, are also interested in the size of a claim from extreme events, like hurricanes and terrorist attacks.

The theory behind determining the extreme value distributions is as follows:

Let X be a random variable with cumulative distribution function F(x) (the equation relating the variable/'s value x to the probability F(x) of being less than or equal to that value).

Let Xmax = MAX(X1, X2, …, Xn) and Xmin = MIN(X1, X2, …, Xn)

Then the cumulative distribution function of Xmax and Xmin are:

and

Substituting the cumulative distribution functions for each parent distribution and then letting n approach infinity gives the equations of each distribution's respective extreme value distribution.

The Extreme Value distribution, also frequently known as the Gumbel distribution, is somegimes referred to as the MaximumExtreme and the opposite of the MinimumExtreme distribution. The Extreme Value distribution is actually one of the only three possible extreme value distributions. The other two distributions are versions of the Weibull distribution (the variable -X is Weibull distributed) and the Frechet distribution though the Frechet is not popularly used. They have the following cumulative distribution functions:

Distributions for largest extreme

Distribution

CDF

Type I (GumbelMax(a,b) = ExtremeValue(a,b) )

$//$

$//$, $//$, $//0\end{array} //]]>$

Type II (FrechetMax(a,b,c) )

$//$

$//} a\end{array} //]]>$, $//0\end{array} //]]>$, $//0\end{array} //]]>$

Type III (Weibull-typeMax(a,b,c) )

$//$

$//} a\end{array} //]]>$, $//0\end{array} //]]>$, $//0\end{array} //]]>$

Distributions for smallest extreme

Distribution

CDF

Type I (GumbelMin(a,b))

$//$

$//$, $//$ $//0\end{array} //]]>$

Type II (FrechetMin(a,b,c) )

$//$

$//$, $//0\end{array} //]]>$, $//0\end{array} //]]>$

Type III (Weibull-typeMin(a,b,c) )

$//$

$//} a\end{array} //]]>$, $//0\end{array} //]]>$, $//0\end{array} //]]>$

The theory of extreme values says that the largest or smallest value from a set of values drawn from the same parent distribution tends to an asymptotic distribution that only depends on the tail of the parent distribution. The Gumbel distribution is the extreme value distribution for all parent distributions of the Exponential family, e.g. Exponential, GammaNormal, Lognormal, Logistic and itself. The Frechet distribution is the extreme value distribution for parent distributions of the form of Pareto, Student-t, Cauchy, log-Gamma and itself. The Weibull distribution is the extreme value distribution for Beta, Uniform and Weibull distributed variables but the convergence can be very slow.

As discussed above, the three standard extreme value distributions are the Gumbel (ExtremeValue distribution), the Frechet (model Frechet generates this distribution), and the Weibull, (model Weibull shows how to generate this distribution).

The links to the Frechet and Weibull software specific models are provided here:

The problem with all these extreme value distributions is that:

1. they only work for certain types of parent distributions,

2. they are only asymptotically correct, meaning that one needs to be considering the extreme of a potentially very large set of observations before the extreme distribution is a good model, and

3. the parameter values for these extreme distributions are also difficult to estimate, or even calculate if one knows the parent distribution very well.

At times, a more practical approach to determining the extreme value distribution is to first estimate the underlying parent distribution, and then simulate a set of observations from that distribution and determine at each iteration what the maximum (or minimum) of that set of observations is. Thus, by running many iterations one arrives at a well-defined extreme distribution. A lot of iterations (probably several thousand) are needed to determine the extreme distribution well because simulation statistics like a maximum or minimum take a long time to stabilize.

The parameters of the Extreme Value distribution are usually determined by data fitting except in certain circumstances where the parent distribution is known and the relationship between its parameter values and the parameter's values of the appropriate extreme value distribution are also known. Gumbel (1958) provides an old but still excellent treatise on extreme value theory. The form of the ExtremeValue distribution or the MaximumExtreme distribution is to model the maximum extreme. For a variable that has an exponential family lower tail to its parent distribution, sometimes the simulation software package offers the MinimumExtreme distribution to model the minimum extreme distribution. If not available, the minimum extreme distribution, can easily be modeled by reversing the sign of X. Model Fitting ExtValue gives an example for such a distribution.

The links to the Fitting ExtValue software specific models are provided here:

If you have a set of minimum data, change the sign of each data point, fit Crystal Ball's ExtremeValue distribution to the data to give ExtremeValue $//$ and then reverse the sign again, i.e. Lower extreme = - ExtremeValue $//$ .

If you have a set of minimum data, change the sign of each data point, fit @Risk's ExtremeValue distribution to the data to give RiskExtValue $//$ and then reverse the sign again, i.e. Lower extreme = - RiskExtValue $//$ .

Contagious extreme value distributions

Sometimes we are interested in the largest (or smallest) of a random number of random variables.  For example, the largest flood that might occur in a period, where the number of floods is random, and also the size of each flood is random. Other examples are earthquakes, explosions, stock price jumps, and accidents. Sometimes, neat mathematical solutions are available for modeling the extremes of such systems. Model Contagious extreme value distribution demonstrates the result by simulation.

The links to the Contagious Extreme Value Distribution software specific models are provided here:

As an example, if the number of gas explosions in a period can be described by Poisson(λ) and the intensity of an explosion is described by a shifted Exponential distribution (e.g. = c + Exponential(1/b) ), then the maximum explosion intensity is given by an Extreme Value distribution: = ExtremeValue(c+bLN(λ), b).

As an example, if the number of gas explosions in a period can be described by RiskPoisson(λ) and the intensity of an explosion is described by a shifted Exponential distribution (e.g. = c + RiskExpon(b) ), then the maximum explosion intensity is given by an Extreme Value distribution: = RiskExtValue(c+bLN(λ), b).

Similarly, if the number of explosions in a period can be described by Poisson(l) and the size of an explosion is described by a Pareto(a,q) distribution, then the maximum explosion intensity is given by a Frechet(0, al1/q, q) distribution. Care needs to be taken here in that one is assuming that the frequency of events and the event intensities are independent. For example, it is well-recognized that earthquake intensities are related to the number of earthquakes: the more earthquakes, the more gently released the tectonic plate energy, and thus the lower the earthquake intensities. Similar arguments can be made about floods. Kottegoda and Rosso (1998) provide plenty of excellent worked examples.

• No labels