**Time series projections**

** **

Time series projections are used to model variables like import volumes, outbreak numbers, consumption rates, share price, exchange rates and bacterial growth where we are interested in modeling the variable over more than one period. We model these variable over time because it is important to us to know their values at intermediate stages in the history of the variable, not just at one point in time.

The time series models that we produce must reflect:

- The relationship of the variable's value at each modeled period;
- Realistic ranges of the variable with time;
- Any trends (drift), seasonality, and cyclicity (identifiable, non-periodic event)
- The relationship between uncertainty and time (whether it increase or decreases, for example)

The simplest forecasting technique is to use the last value available in a time series as our estimate of all future values. This naive forecast is useful because we can look at the forecasting errors it produces and compare those errors with the errors produced by the other more sophisticated techniques. Clearly, if a more sophisticated and time-consuming technique does not provide us with an appreciable increase in accuracy over the naive forecast, it will not be worth adopting. We should attempt to find the technique that produces the smallest forecasting error for the least effort as our best estimator of the future.

The naive forecast may seem over-simplistic, but it is the most appropriate single point estimate of all methods if the parameter being estimated varies according to a random walk. The simplest random walk is where the (n+1)th term in a series is equal to the nth term plus a movement that has a symmetric, zero-centered probability distribution. Such a series has no memory of the path it took to arrive at the nth value: thus, no seasonal or cyclical patterns or trends exist except by pure chance. There are several other types of random walks, and we will look at some of the most important ones.

## Convention

We use the following convention for describing a time series: S_{t} is the value of the time series variable at time t. Thus, for example, a random walk might be expressed as:

S_{t}=S_{t-1}*Uniform(0.9,1.1)

This means that the variable S at time t is only dependent on its value in the previous period (t-1), and is between 90% and 110% of its previous value.

## Example models

In this section, we offer a variety of time series models. Perhaps no one model will exactly suit your problem, but different components can be put together to achieve the model you require.

Random walk 1: S_{t}=S_{t-1}+Normal(m,s)

The simplest model where the variable is a random displacement from its previous value irrespective of the size of that previous value. Other distributions can, of course, be used in place of the Normal. m in this case, but in general the mean of the distribution used, is the linear trend of the model. The model can easily go negative.

Random walk 2: S_{t}=S_{t-1}*Normal(1+m,s)

Similar model to Random Walk 1 where the variable is a random displacement from its previous value but the displacement is proportional to the previous value. Again, other distributions can be used in place of the Normal.

Bounded random walk: Min_{t}<=S_{t}<=Max_{t}

Any random walk that is constrained to remain within limits. The limits themselves may be a function of time.

Poisson random walk with trend: S_{t}=Poisson(mt+c)

A Poisson random variable where the Poisson intensity l may be a function of time.

A random walk where there is a sudden jump in the expected value (usually), spread, or some other characteristic of the probability distribution of S_{t}. These jumps usually occur because of single hit (called cyclical) events, like an election, the introduction of a miracle drug, a war, a change in tax rules, a scandal.

Leading indicator model: S_{t}=f(Y_{t-d})

Leading indicators are variables whose movements presage the movement of a variable we are interested in. In other words, a leading indicator is like an early warning system.

Of the different natural phenomena that exhibit auto-correlated behavior, probably the most common is mean reversion. This says that when an outlier occurs, it will most likely be followed by a movement that brings it back to the mean - 'a correction'.

log ~ S_{t+1} = Normal \bigg( (1-b) log ~ S_t ~ + b\,log ~ S_0 + \bigg(\mu - \frac{\sigma^2}{2}\bigg)(1+bt+b),\sigma \bigg) |

Trend and seasonality model: S_{t}=f(t,period)

Many random variables exhibit some sort of repeated periodic underlying behavior (seasonality). The period can be anything: daily (e.g. number of people on the streets), monthly (payment of regular bills), and yearly (revenue from crops). A trend may also be present.

log(S_t) = N \bigg(log ~ S_0 + \bigg(\mu - \frac{\sigma^2}{2}\bigg)t,\sigma\sqrt t \bigg) |

The stochastic process that is generally assumed for variables like stock price, share price, and interest rate. It is a random walk without memory. It is derived from assuming that the fractional in S over some small time increment is Normally distributed:

\frac{\Delta S}{S} = Normal (\mu\Delta t , \sigma\sqrt {\Delta t} ) |

The result is a lognormally distributed random variable. The Ito process is the basis of Black-Scholes equations for valuing options.

Various common sales volume time series

This is a selection of models that describe various ways of forecasting an uncertain sales volume:

- Finite total market
- Ramping up to an uncertain total sales rate
- Introduction of a competitor
- Sales volume is a function of other random variables
- Launching a new product with an uncertain launch-timing

## Fitting time-series models to historical data

The time-series discussed above are diverse examples of approaches to forecast a variable. In most of the examples above, it was assumed that (1) no relevant and representative data was available, and thus the inputs to the time-series was based on expert opinion, or (2) that the estimation of the inputs was done elsewhere.

However, you may have historical data available that you'd like to use to help in forecasting the future. In that case, you can use a similar approach as when fitting a first-order parametric distribution to data. In summary, we could fit our historical data to a number of candidate time-series models, and with the aid of a number of metrics, including a Goodness of Fit statistic, decide which of these candidate time-series models may be the most appropriate as a basis for forecasting the future.

## Some additional useful principles in modeling a time series

- The model's behavior can be checked with embedded Excel x-y scatter plots;
- Split the model up into components rather than create long, complicated formula. That way you'll see that each component is working correctly, and therefore have confidence in the time series projection as a whole;
- Be realistic about the match between historic patterns and projections. For example, take a look at the model Random Walk 1. Run the iteration a few times and see the variation in patterns you get. Remember, that these all come from the same stochastic model – but they look convincingly different: if any of these had been our historical data, a statistical analysis of the data would have tended to agree with you and reinforced any preconception about the appropriate model, because statistical analysis requires you to specify the model to test. So, don't always go for a forecast model because it fits the data the best – also look at whether there is a logical reason for choosing one model over another;
- Be creative. Short-term forecasts (say 20-30% of the historic period for which you have good data) are often adequately produced from a statistical analysis of your data. Even then, be selective about the model. However, beyond that time frame we move into crystal ball gazing. Including your perceptions of where the future may go, possible influencing events, etc will be just as valid as an extrapolation of historic data.

The links to the Random Walk 1 software specific models are provided here: