To learn more about EpiX Analytics' work, please visit our modeling applications, white papers, and training schedule.
Page tree
Skip to end of metadata
Go to start of metadata



Introduction to Bayesian inference concepts


Bayesian inference is based on Bayes' Theorem, the logic of which was first proposed in Bayes (1763). Bayes' Theorem states:






This formula is not very intuitive, so if you are not very familiar with the subject you may want to first review the example below.

 Bayesian doctor example

Imagine that you are a doctor. A patient comes in with a particularly unusual set of symptoms. About 70% of people with those symptoms have disease A, 20% disease B and 10% disease C. So, your best guess is obviously that this person has disease A. We could say that your prior belief looks like this:



The vertical scale represents how confident you are about the true state of the patient. The disease the patient is suffering from is not a random variable, so the vertical axis is not probability.


Being a doctor, you have a simple test you can perform. You take a sample of saliva, add it to a tube with some chemical, and look for any change in color. It says on the box this test comes in that the test will turn black with different probabilities:


60% for a person has disease A;

90% for a person has disease b; and

100% for a person with disease C


The test turns black… so what is your belief now about the patient? An event tree plot of what could have happened would help your thinking:

The probability that a person came in with disease A and gave a black test result = 0.7*0.6 = 0.42;

For disease B it is 0.2*0.9 = 0.18; and

For disease C it is o.1*1.0 = 0.1.


One of these three scenarios must have occurred, so you weight the three according to these probabilities:


Confidence it is disease A:  

Confidence it is disease B: 

Confidence it is disease C: 


Now your belief looks like this:



It shows that the test hasn't much affected your belief. Even though, for example, the test had a 100% probability of giving the observed result for disease C, you are still reasonably certain it is not the one. You will probably treat the patient for disease A.


We have just performed the Bayesian inference calculation of Equation 1 introduced earlier.


The equation is answering the question: How confident are we about state Ai given we have seen B? In this problem, Ai are the states of the patient (A1,A2,A3 = disease A, B, C respectively), and B is what we observed (the black test result). So:


P(A1) = 0.7

P(A2) = 0.2

P(A3) = 0.1

P(B|A1) = 0.6

P(B|A2) = 0.9

P(B|A3) = 1.0

Notation for Bayesian inference


If you are unfamiliar with Bayesian notation, expand this link.


Bayesian inference is about shapes


The basic equation of Bayesian inference is:


{f(\theta |X)=\frac{\pi (\theta ).l(X|\theta)}{\int \pi (\theta).l(X|\theta).d\theta } \quad \text{when theta continuous }}
{f(\theta |X)=\frac{\pi (\theta ).l(X|\theta)}{\sum \pi (\theta).l(X|\theta)} \quad \text{when theta discrete}}



The denominators in these equations are normalizing constants to give the posterior distribution a total confidence of one. Since the denominator is simply a scalar value and not a function of Θ, one can rewrite the equations in a form that is generally more convenient:



{f(\theta |X)\propto \pi (\theta).l(X|\theta)}



The shape of the prior distribution embodies the amount of knowledge we have about the parameter to start with. The more informed we are, the more focused the prior distribution will be:


Example 1: Comparison of the shapes of relatively more and less informed priors

The shape of the likelihood function embodies the amount of information contained in the data. If the information it contains is small, the likelihood function will be broadly distributed, whereas if the information it contains is large, the likelihood function will be tightly focused around some particular value of the parameter:


Example 2: Comparison of the shapes of likelihood functions for two data sets. The data set with the greatest information has a much greater focus.


But the amount of information contained in the data can only be measured by how much it changes what you believe. If someone tells you something you already know, you haven't learned anything, but if another person was told the same information, they might have learned a lot. Keeping to our graphical review, the flatter the likelihood function relative to the prior, the smaller the amount of information the data contains:


Example 3: The likelihood is flat relative to the prior so has little effect on the level of knowledge (the prior and posterior are very similar)



Example 4: The likelihood is highly peaked relative to the prior so has a great influence on the level of knowledge (the likelihood and posterior have very similar shapes)


The closer the shape of the likelihood function to the prior distribution, the smaller the amount of knowledge the data contains and so the posterior distribution will not change greatly from the prior. In other words, one would not have learned very much from the data:


Example 5: Prior and likelihood have similar shapes (i.e. they agree) so the posterior distribution is not greatly influenced by the prior.


On the other hand, if the focus of the likelihood function is very different from the prior we will have learned a lot from the data:




Example 6: The likelihood is highly peaked relative to the prior so has a great influence on the level of knowledge (the likelihood and posterior have very similar shapes)

That we learn a lot from a set of data does not necessarily mean that we are more confident about the parameter value afterwards. If the prior and likelihood strongly conflict, it is quite possible that our posterior distribution is broader than our prior. Conversely, if the likelihood leans towards an extreme of the possible range for the parameter, we can have a likelihood that has a significantly different emphasis to the prior, yet we get a posterior distribution that is narrower than the prior distribution: 

Example 7: The likelihood is highly peaked relative to the prior and focused on one extreme of the prior's range, so is in reasonable disagreement with the prior, yet the posterior is strongly focused despite the disagreement because the parameter cannot be negative and is therefore constrained at zero.



  • No labels