This is a typical sort of risk analysis problem where we need to achieve a certain number of successes (one or more) and each attempt (trial) may or may not become a success according to some random process. Recognising the type of process is the staring point :
Binomial process
The simplest type of example is for a binomial process, where each trial has the same probability of success. Then there is an elegant solution embodied in one distribution. If we require s successes and the probability that any individual trial will succeed is p, then the distribution of the number of trials we will need is given by:
Trials needed = NegBinomial(p,s)
Note that the NegBinomial(p,s) distribution is modeling the total number of trials. When subtracting s, we get the number of failures before reaching s successes. When we only need one success, the above formula simplifies to:
Trials needed = Geometric(p)
Because the Geometric(p) distribution is just the NegBinomial(1,p) distribution. The mathematics behind the Negative Binomial distribution makes it the appropriate distribution to use.
Example
Let's imagine that we have some machine making a component. We have an order with a very narrow performance tolerance such that only 1 in 4 components this machine makes would pass the quality control. We'll further imagine that the machine has already been set up to produce the maximum chance of the manufactured components complying (so the probability of compliance will not improve). We need to fill an order for 250 components. Each component costs us $12.50 to manufacture. What price/unit should we quote to give us a 75% chance of making some profit?
The software specific distribution of the number of components we may have to make is given by:
Conclusion: we should quote a per unit price of $51.80 because there is a 75% chance that the actual outturn cost to us will be less than that figure, and we will therefore make at least some profit. Here's a question for you: If the client changed their mind and said they now want just 100 units, should we recalculate the price?
Hypergeometric process
A hypergeometric process is one where we are taking random samples from some population of size M of individuals that fall into two (or more) categories. Sticking for the moment to just two categories (e.g. Labour voters, not Labour voters, or Male, Female, etc) we define a random sample from M to be a "success' if we pick an individual from some sub-population of size D. The probability of success changes from one trial to the next as we take consecutive samples from the population. The Negative Binomial distribution won't be appropriate therefore unless the size of the sample we might take is small relative to the size of the population (a rough rule of thumb is that the possible sample size should be less than about 1/10 of the population).
The distribution corresponding to the Negative Binomial distribution, but for the hypergeometric process, is called the Inverse Hypergeometric distribution. The logic to arrive at this distribution very much parallels the development explained in the slide show above.
The link to software specific Hypergeometric models and explanations are as follows:
Other processes
For other processes, there may be elegant solutions to the number of trials needed to achieve a certain number of successes, but it is much more likely that simulation models will need to be built from scratch to determine the distribution. We give three examples here for you to get an idea of the type of techniques that will help you produce such models.
Example 1
You are a government body doing research into the effects of marriage and smoking on peoples' health. You are doing a random telephone survey and you require 50 people from each of the four possible categories. From previous studies you know that 32% of people agree to participate in this type of survey when called. How many calls will you need to make, given that previous studies show the population to be split into the four categories as follows:
Population distribution | Smoker | Non-smoker |
Married | 7% | 26% |
Not married | 28% | 39% |
Model Healtheffect determines how many calls you'll have to make. It uses the Multinomial distribution.
The links to the Healtheffect software specific models are provided here:
Example 2
You need a replacement PC. The IT manager says there are 22 PCs stored in the basement, but 3 have bad hard disks only, two have bad motherboards only, and one has both a bad hard disk and motherboard. Of course, nobody can remember which ones. For reasons he alone understands, you can only take out one PC at a time, coming to him to ask for the key, and then returning it afterwards. Presuming you can dismantle PCs and rebuild them, how many trips will you have to make to the basement to get a working PC?
Model Computers in the basement shows the solution to this example.
The links to the Computers in the basement software specific models are provided here:
Example 3
This is an extension to this topic. Here we will not only count the failures, but also sum the random variables.
A manufacture is trying to extrude a single length of copper wire of 5 kilometres, but the extrusion process has a certain failure rate of 0.07 failures per kilometre. If the failure occurs before he produces his 5 km of wire, then he has to start again. We wish to determine the distribution of the total amount of wire that will be produced in kilometres in order to get 5 kilometres of perfect wire and the distribution of the number of times the production will need to be restarted.
Model COPPER shows the solution to this example.
The links to the COPPER software specific models are provided here: