When we do not have a great deal of data, a considerable amount of uncertainty will remain about an empirical distribution determined directly from the data. It would be very useful to have the flexibility of using an empirical distribution, i.e. not having to assume a parametric distribution, and also to be able to quantify the uncertainty about that distribution. The following Bayesian technique provides these requirements.
Consider a set of n data values {x_{j}} drawn from a distribution, and ranked in ascending order {x_{i}} so x_{i}<x_{i}+1. Data thus ranked are known as the order statistics of {x}. We can use these order statistics to construct a second order empirical cumulative distribution
Here are the mathematics behind the technique:
Individually, each of the values of {x_{j}} may map as a U(0,1) onto the cumulative probability of the parent distribution F(x). We take a U(0,1) distribution as the prior distribution for the cumulative probability for any value of x. We can thus use a U(0,1) prior for P_{i} = F(x_{i}) for the value of the i^{th} observation. However, we have the additional information that of n values drawn randomly from this distribution, xi ranked i^{th}, i.e. (i1) of the data values are less than x_{i} and (ni) values are greater than x_{i}. Using Bayes' Theorem and the Binomial Theorem, the posterior marginal distribution for P_{i} can readily be determined, remembering that Pi has a U(0,1) prior and therefore a prior probability density = 1:
which is simply the standard Beta distribution Beta(i,ni+1):
P_i = Beta(i, ni + 1) (1)
Equation 1 could actually be determined directly from the fact that the beta distribution is the conjugate to the binomial likelihood function and that a U(0,1) = Beta(1,1). The mean of the Beta(i, ni+1) distribution equals i / (n+1): a formula that has been used to estimate the bestfitting firstorder nonparametric cumulative distribution.
Since P_{i+1}>P_{i}, these Beta distributions are not independent so we need to determine the conditional distribution f(P_{i+1}P_{i}), as follows. The joint distribution f(P_{i},P_{j}) for any two P_{i}, P_{j} is calculated using the Binomial Theorem in a similar manner to the numerator of the equation for f(P_{i}x_{i}; i=1,n), i.e.:
where P_{j}>P_{i} and remembering that the prior probability densities for P_{i} and P_{j} equal 1 since they have U(0,1) priors.
For j=i+1:
The conditional probability f(P_{i+1}P_{i}) is thus given by:
where k is some constant. The corresponding cumulative distribution function F(P_{i+1}P_{i}) is then given by:
F(P_{i+1}P_{i})=1 at P_{i+1}=1, so k = (ni) and the formula reduces to:
F(P_{i+1}  P_i) = 1  \Bigg( \frac {1P_{i+1}}{1P_i} \Bigg)^{ni} (2)
Equations 1 and 2 provide us with the tools to construct a nonparametric secondorder distribution for a continuous variable given a data set sampled from that distribution. The distribution for the cumulative probability P_{1} that maps onto the first order statistic X_{1} can be obtained from Equation 1 by setting i = 1:
P_1 = Beta (1, n) (3)
The distribution for the cumulative probability P_{2} that maps onto the first order statistic X_{2} can be obtained from Equation 2. F(P_{i+1}P_{i}), being a cumulative distribution function, is Uniform(0, 1) distributed. Thus, writing U_{i+1} to represent a Uniform(0, 1) distribution in place of F(P_{i+1}P_{i}), using the identity 1U(0, 1) = U(0, 1), and rewriting for P_{i+1}, we obtain:
P_{i+1} = 1  \sqrt [ni] {U_{i+1}} (1P_i) (4)
which gives:
etc..
Note that each of the U_{2}, U_{3}, …U_{n} Uniform distributions are independent of each other.


Recapping from the proof, the formulae from Equations 3 and 4 are:
p_1 = Beta (1,n,1) (3)
P_{i+1} = 1\sqrt[{ni}]{U_{i+1}} . (1P_i) (4)
where P_{j} represents the estimate of the cumulative distribution function at x_{j}, = F(x_{j}). These can be used as inputs to construct a Cumulative distribution, together with subjective estimates of the minimum and maximum values that the variable may take, which can also be assigned subjective distributions.
Model NonParaCont2 creates a secondorder distribution using this technique.
The links to the NonParaCont2 software specific models are provided here:
Limitations
There are a few limitations to this technique. In using a Cumulative distribution, one is assuming a histogram style probability distribution function between each of the {x} values. When there are a large number of data points, this approximation becomes irrelevant. However, for small data sets the approximation will tend to accentuate the tails of the distribution: a result of the histogram "squaringoff" effect of using the Cumulative distribution. In other words, the variability will be slightly exaggerated. However, the squaring effect can be reduced, if required, by using some sort of smoothing algorithm and defining points between each observed value. In addition, for small data sets, 'the tails' contribution to the variability will often be more influenced by the subjective estimates of the minimum and maximum values: a fact one can view positively (one is recognizing the real uncertainty about a distribution's tail), and negatively (the smaller the data set, the more the technique relies on subjective assessment).
The fewer the data points, the wider the confidence intervals will become quite naturally and, in general, the more emphasis will be placed on the subjectively defined minimum and maximum values. Conversely, the more data points available, the less influence the minimum and maximum estimates will have on the estimated distribution. In any case, the values of the minimum and maximum only have influence on the width (and therefore height) of the end two histogram bars in the fitted distribution. The fact that the technique is nonparametric, i.e. that no statistical distribution with a particular cumulative distribution function is assumed to be underlying the data, allows the analyst a far greater degree of flexibility and objectivity than that afforded by fitting parametric distributions.