> Home > Statistics Notes > Probability > Expectation

## Probability Distributions

We've spent some time studying the binomial distribution, which gives the probability of getting r successes in N independent Bernoulli trials. The sample space of the Binomial distribution is the set of numbers {0,1,2,...,N-1,N}. Each of the events {r}, where i is between 0 and N, has a positive (nonzero) probability when the success probability is not 0 or 1, but somewhere in between. Of course, numbers larger than N have probability zero - there is no way in God's green earth to get say 47 successes out of only 30 Bernoulli trials. But once you know N and p, you can calculate the probability that r successes are observed in any one instance of this experiment. All the events {r} are mutually exclusive, and all events that you could want to take the probability of can be written as a disjoint union of the events {r}. So for instance, the probability that there are k or fewer successes can computed by adding up the probability that there are 0 successes, 1 success, and so on up to k.
Call the number of successes in N independent Bernoulli trials X. One says that X is a Binomial random variable, or that X is a random variable with the Binomial probability distribution. The random variable takes integer values anywhere between 0 and N, and is an example of a discrete random variable. For any of the values that the random variable could take, the Binomial probability function we've been discussing tells you the probability that the random variable will take that particular value.
There are other probability distributions that a discrete random variable (I'll define this in more detail later) could take. We will discuss the hypergeometric distribution soon; another, called the Poisson distribution, has considerable practical importance too and is discussed in the extra credit projects. There are huge books filled with all sorts of interesting probability distributions for different purposes.

## Expectation Values

On the previous page, we examined the concept of the sample mean, and we found that we could express the sample mean as a sum of terms that look like xfx. Such terms are the product of the value x times the frequency that x occurs in the data set. Each term like this adds up (through repeated multiplication) all the data points whose value is x, and goes ahead and divides by the number of data points. When using this formula, you are still adding up all the data values and dividing by the number of data values, but you are doing so in a different order.
Also recall that if an experiment is replicated independently more and more times, the relative frequency of some occurrence approaches the probability that that occurrence will occur. Suppose that we are doing some experiment that is generating new data, independently. For instance:
``` 1 1,4 1,4,2 1,4,2,4 1,4,2,4,3 1,4,2,4,3,1 1,4,2,4,3,1,1 ```
and so forth. We could generate running averages as we go. Each new data point would be averaged in. The first average we have is just the average of the number 1, which is just 1. Then we have the average of 1 and 4, which is 2.5; the second in our sequence of averages is 2.5. The third one is the average of 1,4, and 2. This is 2.333 or so. The fourth is the average of 1, 4, 2, and 4, which is 11/4, and so forth.
We may also keep track of the relative frequency of each of the possible data values as we go. After the first data point has been received, we have a relative frequency of 1's which equals 1/1. After the second data point, the value 1 has a relative frequency of 1/2 and the value 4 has a relative frequency of 1/2. After the third, the value 1 has a relative frequency of 1/3, the value 2 has a relative frequency of 1/3, and the value 4 has a relative frequency of 1/3 also. And after the fourth, the value 1 has a relative frequency of 1/4, the value 2 has a relative frequency of 1/4, and the value 4 has a relative frequency of 2/4. After every new data point, we can compute the relative frequency of occurrences of all the data values.
Since we can think of each new data point (generated independently) as a new experiment, we know that as more and more experiments are done, the relative frequency should approach the probability of occurrence of that data value. But the average is the sum, over all the data values, of terms like xfx. If each of the relative frequencies fx is approaching the probability that x will occur, that is, px , shouldn't the whole sum start to look like a sum of terms like xpx? Provided it is possible to actually calculate this last sum (involving probabilities), this turns out indeed to be the case.
If we have a discrete random variable X taking values x with probability px, then the sum of terms of the form xpx over all values of x is called the expectation value of the random variable, if it exists. (The catch is this: if I have a collection of numbers, I can always calculate their sample mean. But it is possible that a random variable could take infinitely many different values; the sample space may be infinite. Sometimes it is possible to add up infinitely many numbers; sometimes it isn't. For instance, if I take the sequence 1, 1/2, 1/4, 1/8, 1/16, 1/32, and so forth, I can add them up, and make the answer as close to 2 as I want by adding more and more terms. But if I take the sequence 1, 1/2, 1/3, 1/4, 1/5, and so on, and calculate 1, 1+1/2, 1+1/2+1/3, 1+1/2+1/3+1/4, and so on, this can get as big as you want it to be if you're willing to add up enough of these fractions. So sometimes you can add up infinitely many numbers and sometimes trying to add up infinitely many little things just doesn't work.)
When examining repeated independent random quantities, intuitively, it should be reasonable that if the relative frequencies approach their corresponding probabilities, that the sample mean should approach the expectation value (if it exists). In some sense, if you have enough data, it ought to be possible to be fairly sure that the sample mean is going to be close to the expectation value (if there is an expectation value). Results of this form are called "laws of large numbers"; their precise statement and proof are outside the range of our class. One such result is called the Weak Law of Large Numbers; another is called the (Kolmogorov) Strong Law of Large Numbers.
If you write a Bernoulli distribution as having the value 1 when a success occurs, and zero otherwise, you can calculate its expectation by 0(1-p) + 1p=p.
Here is another simple example involving sampling with replacement. Suppose we put 3 red marbles labeled "1" in a box, along with 2 white marbles labeled "2" and 5 blue marbles labeled "3". If I shake the box up and draw one at random, the number on the marble is a random variable. The probability of drawing a red "1" is 3/10, the probability of drawing a white "2" is 2/10, and the probability of drawing a blue "3" is 5/10. What is the expectation value of the number on the marble? You can calculate it by writing down all the values this random variable can take, multiplying it by the probability it will occur, and adding it all together. Try this as an exercize.

On to the Geometric distribution.