To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.

Page tree

Arrays are shown in curly brackets {.,.,.}. They must be input as a list, e.g. {1,2,3..} or a range name, or an array reference e.g. A1:A4.

Note that many of the statistical functions below are written for compatibility with earlier versions of EXCEL. For example, NORMDIST() is now NORM.DIST(). The arguments are for the most part the same and both versions of the formula exist in the newer versions of EXCEL, but it's better to use the latest versions (i.e. with a period in the formula name) as they are more accurate.

Descriptive Statistics

These functions provide basic statistical values for data sets and are useful for analyzing data, or the exported results of simulations.

 AVEDEV({x}) returns the average deviation of a data set

AVERAGE({x}) returns the mean of a data set 

 COMBIN(n,x) returns the binomial coefficient

CORREL({x},{y}) returns the product moment correlation coefficient between data sets {x} and {y}. 

COVAR({x},{y}) returns the covariance between data sets {x} and {y}. 

FREQUENCY({x},{bins}) is an array function that counts the number of data points in the data set {x} that are in each bin. You have to be careful of how the function behaves at the ends of the bin array.

 KURT({x}) + 3 returns the kurtosis of a data set

MAX({x}) returns the maximum of a data set 

MEDIAN({x}) returns the median of a data set 

MIN({x}) returns the minimum of a data set 

 MULTINOMIAL({s}) returns the multinomial coefficient  

for which the binomial coefficient COMBIN is a special case.

PERCENTILE(array,k) returns the k-th percentile of values in a range. You can use this function to establish a threshold of acceptance. For example, you can decide to examine candidates who score above the 90th percentile.

Rank =RANK(x,array,order) order: 0=descending; 1=ascending

 SKEW({x}) returns the skewness of a data set

 STDEV({x}) returns the standard deviation of a data set

VAR({x}) returns the variance of a data set 


Distributions

Returns a value (given the probability) or a probability (give the value) from a probability distribution, NOT a random value like Crystal Ball distributions. The parameter cumulative allows you to control whether the returned probability is from the probability mass (or probability density curve), or from the cumulative probability curve.


BETADIST(x,a,b,min,max) returns the cumulative probability of x from the Beta(a, b) distribution. If the min and max optional parameters are used, it rescales the beta distribution.

BETAINV(P,a,b,min,max) returns the cumulative percentile xP from a Beta(a,b) distribution. If the min and max optional parameters are used, it rescales the beta distribution. 

BINOMDIST(x,n,p,cumulative) returns the probabilities of x from the Binomial(n,p) distribution.

 CHIDIST(x,n) returns the cumulative probability from a Chisq(n) distribution.

CHIINV(P,n) returns the cumulative percentile xP from a Chisq(n)  distribution. 

EXPONDIST(x,l,cumulative) returns the probabilities of x from the Expon(b) distribution where l = 1/b. (the departure from standard convention for the parameter can be confusing, and is inconsistent with the GAMMADIST function, so be careful)


 FDIST(x, n1,n2) returns the cumulative probability from an F(n1,n2) distribution.

FINV(P, n1,n2) returns the cumulative percentile xP from an F(n1,n2) distribution. 

GAMMADIST(x,a,b,cumulative) returns the probabilities of x from the Gamma(a,b) distribution. 

HYPGEOMDIST(x,n,D,M) returns the probability mass of x from the Hypergeometric(n,D,M) distribution. 

LOGINV(P,m,s) returns the cumulative percentile xP from a Lognorm2(m,s) distribution (i.e. the mean and standard deviation apply to the corresponding Normal distribution. 

 LOGNORMDIST(x,m,s) returns the cumulative probability from a Lognorm2(m,s) distribution (i.e. the mean and standard deviation apply to the corresponding Normal distribution.

NEGBINOMDIST(x,s,p) returns the probabilities of x from the Negative Binomial(s,p) distribution. 

NORMDIST(x,m,s,cumulative) returns the probabilities of x from the Normal(m,s) distribution. 

 NORMINV(P,m,s) returns the cumulative percentile xP from the Normal(m,s) distribution.

NORMSDIST(x) returns the cumulative probability of x for the standard Normal distribution Normal(0,1). x is equivalent to the z-score in classical statistics. 

NORMSINV(P) returns the cumulative percentile xP from a standard Normal distribution Normal(0,1).

POISSON(x,l, cumulative) returns the probabilities of x from the Poisson(l) distribution. 

Returns the Percentage Points (probability) for the Student t-distribution where a numeric value (x) is a calculated value of t for which the Percentage Points are to be computed. The t-distribution is used in the hypothesis testing of small sample data sets. 

 WEIBULL(x,a,b,cumulative) returns the probabilities of x from a Weibull(a,b) distribution.


Least Squares Regression

These functions return statistical parameters or projections from a least squares regression that attempts to fit an equation that relates the independent variable X to the dependent variable Y as follows:


Y = Normal(m*X+c,Syx)


The {x} array is the set of observations for independent variable X, and the {y} array is the set of observations for the dependent variable Y

See TREND

INTERCEPT({y},{x}) returns the estimated c

RSQ({y},{x}) returns the square (r2) of Pearsons product moment correlation coefficient r

SLOPE({y},{x}) returns the estimated m 

TREND({y},{x},new-xs,constant) returns an array of y-values corresponding to the new x-values, after attempting a regression fit on {y}, {x}. For example:



The Excel function FORECAST serves the same purpose.



General

These functions perform basic operations on cells, look up values in data tables, and perform logical tests which are the basic building blocks of risk analysis modeling.

ABS(x) returns the absolute value of x, so:

ABS(-4) = 4, ABS(4) = 4.

AND(condition1, condition2,&) returns a TRUE if al conditions are met, and a false otherwise. For example:


A1: 10

A2: 12

A3: 6


=AND(A1>=10, A3>2*A1,A2>=2*A3) returns a TRUE

=AND(A1>=10, A3>2*A1,A2>2*A3) returns a FALSE


CONCATENATE (text1,text2,...) joins several text strings into one text string. 

COUNT({x}) counts the number of cells that contain numbers and also numbers within the list of arguments. Use COUNT to get the number of entries in a number field that's in a range or array of numbers.

COUNTIF({x},criterion) counts how many cells in the range that {x} satisfy the criterion.

EXP(x) returns the natural exponent of x, i.e. ex.

FACT(x) returns the factorial of x, i.e. 1*2*&*x. For example:

FACT(4) = 12

FACT(1) = 1

FACT(0) = 1 That might surprise you its a strange result that comes from the gamma function definition of a factorial:


FACT rounds x to the nearest value, but mathematically you can have a factorial of any positive real number. If you want the factorial of say 3.3, you can write:


= EXP(GAMMALN(3.3+1)) = EXP(GAMMALN(4.3))


For any non-negative integer:


FACT(x) = EXP(GAMMALN(x+1))


FALSE( ) returns the logical value FALSE.

GAMMALN(x) returns the log of the gamma function of x. Actually, this IS useful though rather esoteric, as several probability distributions contain a gamma function. It is also an alternative to calculating very large factorial:


= GAMMALN(n+1)


So this function is closely related to Excels FACT function.

HLOOKUP(v,A,i) looks up in a table delineated by the area A. It searches for the value v in the first row, and then returns the value in the ith row of the same column. For example:



This is a useful function in combination with a probability distribution for selecting random or conditional values from tables. However, it can be somewhat tricky in predicting what it will do at the end of tables, and wont work properly unless the first column is put in ascending order. For these reasons we tend to use the functions OFFSET and INDEX for this purpose.

IF(condition,A,B) returns the value A if the condition is true, and the value B otherwise. For example:


A1: 10

A2: 12

A3: 6


=IF(A1>0,2,5) returns a 2

=IF(OR(A1>10, A3>A1,A2>=2*A3),1,0) returns a 1

=IF(AND(A1>=10, A3>2*A1,A2>2*A3),1,NA()) returns #N/A


This function is extremely useful for creating conditional links between random variables so that a model remains internally consistent no matter what values are being generated from the input distributions.

INDEX(array,r,c) returns the value in the rth row and the cth column of the specified array. Very useful for picking up values from tables of data when r,c or both are random variables. There is also another parameter format for INDEX. 

 ISNA(ref) checks whether a value is #N/A, and returns TRUE or FALSE.

LN(x) returns the natural log of x, so LN(EXP(x)) = x 

LOG(x, base) returns the logarithm of x to the specified base. The base is taken as 10 if the optional parameter base is omitted, so aLOG(x,a) = x.

LOG10(x) returns the logarithm of x to base 10, so 10LOG10(x) = x 

NA() returns the error #N/A which is useful for tracking dependent cells, and for excluding certain results from a simulation (see example).

OFFSET(reference, r,c,w,h) returns the value r rows below and c columns to the right of reference. If the optional parameters w and h are used, it becomes an array formula and returns an array staring at the same cell, but extending w cells wide and h cells high. For example:



OR(condition1, condition2,&) returns a TRUE if any of the conditions are met, and a false otherwise. For example:


A1: 10

A2: 12

A3: 6


=OR(A1>10, A3>A1,A2>=2*A3) returns a TRUE

=OR(A1>10, A3>A1,A2<2*A3) returns a FALSE


PI( ) returns the number 3.14159265358979, the mathematical constant pi, accurate to 15 digits. 

ROUND(x,digits) rounds off x to a certain number of decimal places:


ROUND(1234.5678,2) = 1234.57

ROUND(1234.5678,0) = 1235

ROUND(1234.5678,-2) = 1200


ROUNDUP(x,digits) rounds a number up, away from 0 (zero).

SIN(x) returns the sine of the given angle.

SQRT(x) returns the square root of x, e.g. SQRT(16) = 4.

SUM({x}) returns the sum of the cells in the array {x}

SUMPRODUCT({x},{y},{z}, ...) multiplies corresponding components in the given arrays, and returns the sum of those products.

TODAY( ) returns the serial number of the current date. The serial number is the date-time code used by Microsoft Excel for date and time calculations. If the cell format was General before the function was entered, the result is formatted as a date.

TRUE( ) returns the logical value TRUE.

VLOOKUP(v,A,i) looks up in a table delineated by the area A. It searches for the value v in the first column, and then returns the value in the ith column of the same row. For example:



This is a useful function in combination with a probability distribution for selecting random or conditional values from tables. However, it can be somewhat tricky in predicting what it will do at the end of tables, and wont work properly unless the first column is put in ascending order. For these reasons we tend to use the functions OFFSET and INDEX for this purpose.

YEAR(serial_number) returns the year corresponding to a date. The year is returned as an integer in the range 1900-9999.





  • No labels