Home About us Mathematical Epidemiology Rweb EPITools Statistics Notes Web Design Contact us Links |

> Home > Statistics Notes > Probability > The Sample Mean

There is another way to calculate the sample mean. These numbers can be rearranged this way: 10+10+10+10+20+20+20+20+30+30, which is the same as 4*10+4*20+2*30. In other words, why not just count up the number of times the 10 occurred, and just multiply, and do the same for each of the possible values - and then add that up? That is another way to compute the sum in the numerator.

Also, we need to divide by the number of data points. Let's do that too: the average will be equal to (4*10+4*20+2*30)/10. We can also divide each term by the 10, and we find that the average is

So it is possible to compute the average of some data by first calculating the relative frequency of each possible value, then multiplying each value by its relative frequency, and then adding this all together. In general, the sample mean of data points

To do another example, suppose I want to take the average of the values

```
1,2,3,2,3,4,3,6,1,2,1,4,2,3,3,2,1,7,1,2,3,1,1,1,1,1,3,2,3,4
```

There are only six different values, 1,2,3,4,6,7, and the final sum we do is going to have six terms, one for each of them. There are 30 data values. The following table summarizes the calculation:

Data value, x |
Number of occurrences | Relative frequency, f_{x} |
xf_{x} |

1 | 10 | 10/30 | 1*10/30 |

2 | 7 | 7/30 | 2*7/30 |

3 | 8 | 8/30 | 3*8/30 |

4 | 3 | 3/30 | 4*3/30 |

6 | 1 | 1/30 | 6*1/30 |

7 | 1 | 1/30 | 7*1/30 |

The mean is then 73/30, or approximately 2.4333.

When calculating the mean, you will notice that the large values may be balanced by the smaller values. The mean itself is a measure of

The mean is sensitive to outlying points. Try this computer exercise:

```
```**> **x<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1000)

**> **mean(x)

Almost all the numbers happen to be 1, but the mean is much larger than
that. Almost all the numbers are smaller than the average. Try different
values for that last element instead of 1000, and see what happens.
The relative frequency can be thought of as a kind of sample mean. Suppose that I am computing the relative frequency of a certain event. At every experiment, I can define a random quantity whose value is 1 when the event occurs and 0 when it doesn't. If I have done the experiment N times, I have N numbers which are 0 and 1. Adding these together automatically generates the number of times the event has occurred, and dividing by N computes the average of these 0's and 1's. In general, a quantity that takes the value 1 when some event happens and 0 when it doesn't is called an

The sample mean gives a kind of typical value "in the middle" of a distribution. But the sample mean might not occur in the actual data; the sample mean might not even be a possible value of the actual data - how would anyone have 2.2 children? And sometimes the average of two things may not be quite what is needed - here is an old joke: two statisticians go hunting. After a long day of tracking and stalking, the first fires at a target and misses 5 feet to the left. The second aims at the now-fleeing target, and fires, but misses 5 feet to the right, as the target disappears into the woods. But the first statistician exults - got him!

Other than the sample mean, there are other ways to create a measure of central tendency. The particular average we discussed is sometimes called the

On to expectation values.

Return to statistics page.

Return to probability page.

Return to stochastic seminar.

All content © 2000 Mathepi.Com (except R and Rweb).

About us.