Home About us Mathematical Epidemiology Rweb EPITools Statistics Notes Web Design Contact us Links |

> Home > Statistics Notes > Probability > Binomial Distribution

We need to be able to distinguish these outcomes. It is traditional to call one of them "success" and the other "failure". "Success" in this context need have no connotation of actual success - it is just a convenient label. So we could think of "success" as rolling a 1, or of finding the risk factor.

Suppose the probability of success is

An experiment which has two outcomes is often called a

We may also be interested in other outcomes. For instance, we may wish to know the probability that

Another example is the probability that "at least 1 person was infected". The phrase "at least 1" means that 1 is the least possible number; saying "at least one person was infected" means that 1 person was infected, or 2, or 3, or any other number up to 51 (in this experiment). We could in fact calculate all these probabilities and add them up; this is sometimes the easiest thing to do. In this case, we could observe that the probability that at least one person was infected is one minus the probability that it is false that at least one person was infected. The event "it is false that at least one person was infected" is the same thing as the event "no people were infected". So we can just calculate this and subtract it from one. For the needlestick problem, it is easier to calculate a single probability and do the subtraction than it is to calculate 50 probabilities and add them together.

Now, what is the probability that no one was infected? The probability that the first person is not infected is 97%. The probability that the second person is not infected is also 97%. And so on; all the probabilities are 97%. Let's ask what the chance is that both the first and the second persons were not infected. Because the events are independent, the probabilities can be multiplied; the probability is then 0.97 times 0.97, which is 0.9409. What is the chance that the first three are not infected? This is 0.97 times 0.97 times 0.97, which is about 91%. What is the chance that all 51 people were not infected? It is what you get when you multiply 0.97 by itself 51 times, or 0.97 to the 51st power; this number is 15.6%.

Let's take another example. What is the chance that if we toss a fair die 4 times, we get NO occurrences of a 1? We'll treat this as 4 independent Bernoulli trials. The chance that the first die does not show a one is 5/6, the chance that the second die does not show a one is 5/6, and so forth; because the rolls are independent, we can find the probability that all the rolls failed to show a one (which is the same as the probability that none of them showed a one, and the same as the probability that each time we tossed the die, we got a 2, 3, 4, 5, or 6.) This equals about 48.23%.

But now, what if we wanted to know the probability that on the first toss we got a one, but then got no ones after that. The probability of getting a one on the first roll is 1/6, the probability of not getting a one on the second roll is 5/6, of not getting a one on the third roll is 5/6, and of not getting a one on the fourth roll is also 5/6. By independence, we can multiply all these together, and we learn that the probability we are looking for is about 9.64%.

We may also ask what the probability that we do not get a one on the first toss, but we do get a one on the second toss, and we don't get a one on the final two tosses. Here we find that the probability is 5/6 times 1/6 times 5/6 times 5/6, using the same reasoning as in the previous paragraph.

What if we want to know what the chance is that we get a one on any of the four tosses, but not on the other three? In other words, what is the probability that we get exactly one show of one spot on the die in four throws? We could get the one on the first try (and not on the others), or on the second try (and not on the others), on the third try (and not on the others), or on the fourth try (but not on the others). So the event "we get exactly one showing of one spot" can be written as the union of four events, the event that we get the one on the first throw but not the others, etc. And if we get the one on the first throw but not the others, then we could not have gotten it on the second throw. These four events are mutually exclusive or mutually disjoint. So we can calculate the chance of getting a one exactly once by adding up the probabilities of each of the four ways we could have gotten a one. Since each of these probabilities is the same, we can multiply by four, since four is the number of ways to get one infection out of four possible infections. This happens to be about 38.58%.

What if we want to know what the chance is that we get a one on exactly two of the four tosses? We can calculate the probability that we get a one on the first toss, on the second toss, not on the third toss, and not on the fourth toss; this is about 1.93%. There are five other ways to choose the tries on which we get the one. Each of these orderings has the same probability. In other words, we might have got the one on the first try, not on the second, not on the third, but seen another one on the fourth try. The chance we would have seen this pattern is the same 1/6 times 5/6 times 5/6 times 1/6. All we need to do is see that if we have two ones that showed up, we have to multiply the chance of getting a one (which is 1/6) by itself two times. Then we get a non-1 on 2 times, and so we have to multiply the probability that we see the non-1 on some trial (which is 5/6) by itself two times. Then we multiply these together, and this gives us the chance of seeing any particular pattern of two 1's and two non-1's. But we don't need the chance of seeing any one of these patterns. We need the chance of seeing at least one of the six patterns that have exactly two 1's. All these have the same probability, so we can figure out what this probability is and multiply it by 6. This winds up to be about 11.57%.

Now let's go back to the needlestick example. What if we want to know the chance that exactly one person got infected? We know that we can find the chance that the first person was infected and none of the others were by writing 0.03 (that's the chance the first person was infected) by 0.97 (the chance the second person was not), and then by 0.97 again for the third person, and so on down to the 51st person. We multiply together a single factor of 0.03, and fifty factors of 0.97. This happens to be about 0.00654, or about six tenths of one percent. But again, we don't just want the chance of the

But what if we want the chance that exactly two people were infected? It could have been the first and the second people who got infected, or the second and the forty-third, and so forth. Let's determine the chance that the first and second people got infected but not the others. This is going to be 0.03, times 0.03, times 0.97 (49 times). Every person who gets infected gives us a factor of 0.03 (the chance of an infection in one trial), and every person who does not get infected gives us a factor of 0.97 (so we get 49 such factors, one for each of the 51 minus 2 who don't get infected.) The chance that the second and forty-third people get infected (and no one else) is the same as the chance that the first and second get infected (and no none else) and so on. This probability is 2.02 times ten to the minus fourth power. Each of the possible ways to get the two infected people has the same probability; if we knew how many ways to choose the two infected people out of the 51 possibilities, we could multiply by this number. By definition, this is called the number of

It will be useful to consider

Now let's try the 4 things and choose 3 of them. We have four choices for the first one. Once we choose the first one, we have three choices for the second one, and then after this, we have two choices for the third. So for instance, if we chose the B for the first slot, we still have the three items A, C, and D from which to choose the second one; if we then pick the D for the second one, we still have the A and the C left from which to choose the third slot. So there are 4 times 3 times 2 of these ordered choices. They are ABC, ABD, ACB, ACD, ADB, ADC, BAC, BAD, BCA, BCD, BDA, BDC, CAB, CAD, CBA, CBD, CDA, CDB, DAB, DAC, DBA, DBC, DCA, and DCB. Notice that we got ABC, ACB, BAC, BCA, CAB, and CBA. There are 6 ways to arrange the three objects A, B, and C and we got all six.

If we started with N objects, and we want to choose

What if we started with 4 objects and chose all 4 of them (keeping track of the order)? We just did the case of choosing three a paragraph ago. Once we've chosen three, we have only one choice left for the fourth. So we get these 24 arrangements: ABCD, ABDC, ACBD, ACDB, ADBC, ADCB, BACD, BADC, BCAD, BCDA, BDAC, BDCA, CABD, CADB, CBAD, CBDA, CDAB, CDBA, DABC, DACB, DBAC, DBCA, DCAB, and DCBA. There are 4 times 3 times 2 times 1 different orders of these 4 objects.

If we started with N objects and chose them all (keeping track of the order)? How many different orders? Following the same logic, we find N choices for the first one, N-1 for the second object, and all the way down to 1 choice for the last one. To find out how many, we multiply N times N-1 times N-2 times all the numbers on down to 1. This comes up a lot, and has a name: N! means N times N-1 times N-2 times all the numbers on down to 1; it is read "N factorial".

Now, we know the number of ordered ways to choose r objects out of a total of N was found by taking N, multiplying it by N-1, and so on down to N-(r-1). Now if we start where we left off and take N-r, and multiply by N-(r+1), and keep going down to one, we will have (N-r)! (the number N-r, factorial); the first number we started with is N-r, the next one is N-(r+1). It must be N-(r+1); what number is one less than N-r? It is N-r-1 which is the same as N-(r+1). We can multiply and divide by (N-r)! without changing anything, and we find that the number of permutations of N things taken r at a time is N!/(N-r)!.

Now, what if we just want to consider the number of different combinations of objects we might have gotten, regardless of the order? For instance, we calculated the number of ordered ways to choose 3 objects out of a total of 4 objects, and found 24 such ways. But ignoring order, there are only four combinations: ABC, ABD, ACD, and BCD. When we considered the ordering, each of these got counted six times, because there are six different ways to order 3 things. There are six ways to order 3 things, because we have 3 ways to choose the first one, 3-1=2 ways to choose the second, and one way to choose the last; so there are 3 times 2 times 1 which is six different orders. So there are 24 ordered arrangements, and we divide by six to get a total of 4 combinations of 4 things taken 3 at a time. If we want to know how many combinations of N objects taken r at a time there are, we can first figure out how many permutations of N objects taken r at a time there are, and divide by the number of orders of the r objects. But we already know how many ways to order the r objects: this is r times r-1 times all the numbers down to 1, which is r!. So the number of combinations of N things taken r at a time is N!/(r! (N-r)!). This is called "N choose r" and is written like this: .

Finally, we can calculate the probability of getting r successes out of N independent Bernoulli trials. First, there are ways to choose the r successes out of the N trials. We have r factors of p for the successes, and (N-r) factors of (1-p) for the failures. And that is all there is to the binomial formula. This is only one of two formulas we will really discuss in detail in this class; we will build a great deal on it.

Return to statistics page.

Return to probability page.

Return to stochastic seminar.

All content © 2000 Mathepi.Com (except R and Rweb).

About us.