The maximum likelihood estimates of a distribution type are the values of its parameters that produce the maximum joint probability density or mass for the observed data *X* given the chosen probability model.

Maximum likelihood estimation starts with the mathematical expression known as a *likelihood function* of the sample data. This expression contains the unknown parameters to be estimated. Those values of the parameters that maximize the sample likelihood are known as the *maximum likelihood estimates* which are determined by setting the partial derivative of the likelihood function to zero (i.e. finding the location of the function's peak with respect to the estimated parameters).

*Advantages over other fitting methods*

Maximum likelihood provides a consistent approach to parameter estimation problems. This means that maximum likelihood estimates can be developed for a large variety of estimation situations (essentially whenever you can produce a probability equation relating the parameter to a sufficient statistic of the observations), including missing or censored data.

Maximum likelihood methods have desirable mathematical and optimality properties: they become minimum variance unbiased estimators as the sample size increases. They often have approximate normal distributions and with a Taylor series expansion calculation they can be used to generate Normal distributions of uncertainty.

*Disadvantages*

The likelihood equations need to be specifically worked out for a given distribution and estimation problem which requires probability knowledge.

The numerical estimation is usually non-trivial. Except for a few cases (Normal, Binomial, Geometric, Poisson, Geometric) where the maximum likelihood formulas are simple, it is generally best to rely on high quality statistical software (e.g. R, SAS, Python) to obtain maximum likelihood estimates, or use an optimization algorithm.

Maximum likelihood estimates can be heavily biased for small samples. The optimality properties may not apply for small samples. It is also sensitive to the choice of starting values when using numerical estimation (i.e. using optimization algorithms).

### Examples of MLE derivations

We demonstrate the application of MLE by optimization for four data types: complete data, left, right and interval censored data. These examples also provide illustrations of how to construct likelihood functions.