under construction; last updated November 18, 2003

Home About us Mathematical Epidemiology Rweb EPITools Statistics Notes Web Design Search Contact us

> Home > Computational Epidemiology Course > Lecture 11 (in progress)

Exact Confidence Intervals for the binomial distribution

Let's switch gears a little and do an example. You've probably seen how to do confidence intervals for proportions using the normal approximation to the binomial. But sometimes this leads to problems, especially when the numbers aren't large (so the approximation does not work so well).

Before we show how to compute confidence intervals, however, let's review the binomial distribution itself, and in the process, learn more about the random number functions provided by R.

A Bernoulli trial is a random variable with two outcomes, conventionally called success (1) and failure (0). If you perform N Bernoulli trials, each with success probability p, the number X of successes is a binomial random variable. The expected number of successes is Np, the variance of X is Np(1-p), and the probability that X equals x is given by
(N!/(x!(N-x)!))p^x(1-p)^N-x.

This probability function can be computed in R using the function dbinom. How about tossing a single fair coin, one time? Here there is only one Bernoulli trial, so we have a binomial with N equal to one. Since the coin is fair, the probability of a head is 0.5, so if we think of heads as a success, the success probability p is 0.5. And of course the success probability is 0.5 if we call tails a success also. The number of heads is X, and we want to know the probability that X is 0. So using dbinom:

> dbinom(0,size=1,prob=0.5) [1] 0.5

Notice that N, the number of trials, is the named argument size, and the success probability is the named argument prob. I recommend this form, for easy reading, but you can actually omit the names if you get the order right:

> dbinom(0,1,0.5) [1] 0.5

Let's toss two coins independently, and see what the probability of no heads is. We know it has to be 1/4 by the laws of probability, so let's see:

> dbinom(0,size=2,prob=0.5) [1] 0.25

Let's see what the chance of getting exactly one head is, out of two tosses...independent tosses. Don't forget how important the independence assumption is: one toss must tell you nothing about the others. (Who says I didn't tape all the coins to a stick or something?) But assuming independence, we know that the probability of exactly one head is 0.5 (we have a 1/4 chance of getting a head on the first, tail on the second, and another 1/4 chance of getting a tail on the first and a head on the second). And of course we mean exactly one, not two; if we toss two heads, then in a sense we've "gotten a head" in the tosses but this is not what we mean by exactly one. OK:

> dbinom(1,size=2,prob=0.5) [1] 0.5

Again, N is 2 (size), and p is 0.5 (prob), and if X is the number of heads, and we are computing the probability that X=1, the 1 is the first argument of dbinom.

Let's toss the two independent fair coins again and ask what the chance of getting two heads is. You know it; it's 0.25:

> dbinom(2,size=2,prob=0.5) [1] 0.25

And let's ask what the chance of getting three heads is:

> dbinom(3,size=2,prob=0.5) [1] 0

The value 3 is not even in the sample space; you can't get three heads out of two coin tosses. The function dbinom was kind enough (or unkind enough) to simply give the value 0.

Another example: suppose that 51 individuals receive a needlestick injury with hepatitis-C positive blood, and that the probability of infection from such an injury is 0.03. What is the probability that no individuals are infected? Here, let X be the number of individuals infected. There are 51 trials (N=51), and we will suppose these are independent trials. (Can this be taken for granted?) Let's call infection the "success" and non-infection the "failure", so that the success probability p=0.03. So now the number of infected individuals is a binomial random variable with parameters N and p. We want to know the probability that X equals zero. We could use the formula in the above paragraph explicitly, but this formula is available through dbinom:

> dbinom(0,size=51,prob=0.03) [1] 0.2115234

Suppose that a person has ten sex partners chosen "independently and at random" from a population with a prevalence of HIV of 0.4 (40%). Now suppose that the probability of becoming infected during a partnership with an infected partner (assuming that unprotected sex happens) is 0.1. What is the chance of becoming infected? Here, we use a Bernoulli risk model (Grant et al 1987); each partnership is assumed to be an independent trial. The chance of becoming infected (assuming that partner HIV status and infectivity are independent; is this reasonable?) would be 0.4 × 0.1 = 0.04. Let's call the number of times a person would get infected X, and call "getting infected" the "success"; then we have 10 independent Bernoulli trials each with success probability 0.04. If X is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, then the person gets infected; if X is 0, the person does not. It is simpler to determine the chance of not getting zero (one minus the chance of getting zero) than it is to compute the chance of getting a number between 1 and 10 (inclusive):

> 1 - dbinom(0,size=10,prob=0.04) [1] 0.3351674

So assuming this model is correct the person would have about a 1 in 3 chance of becoming HIV infected.

The function dbinom is vectorized, so we can compute a lot of these probabilities at once. Let's compute the chance of getting zero, one or two heads in two tosses of fair coins independently:

> dbinom(0:2,size=2,prob=0.5) [1] 0.25 0.50 0.25

Here, we gave it a vector of head numbers to compute the probabilities of, namely, 0:2, i.e. 0, 1, and 2. We got a vector of three results, the first result corresponding to the first input value 0, the second result corresponding to the second input value 1, and the third result corresponding to the third input value 2.

Let's look at all the possibilities for the hepatitis example. Here, the size was 51, the success probability was 0.03; the possible outcomes range from 0 to 51:

> barplot(dbinom(0:51,size=51,prob=0.03))

Not a very pretty image, with all those flat bars...but an informative one. There is not much chance of getting more than 8 infections, looking at the graph. (Notice I slipped in the barplot command, quite useful. I haven't shown us yet how to label the axes in a more useful way, but that can wait.) What is the chance of getting (say) 9 or more infections? In this case, I don't want to add up the chance of 0 through 8, getting a number nearly one, and subtract; we always lose precision when subtracting nearly equal values. Subtracting nearly equal values is a terrible thing to do, and you should avoid it whenever possible unless you have a good reason to do otherwise. In this case, double precision arithmetic saved us out to a few significant figures anyway:

> sum(dbinom(9:51,size=51,prob=0.03)) [1] 1.910332e-05 > 1-sum(dbinom(0:8,size=51,prob=0.03)) [1] 1.910332e-05 > dbinom(51,size=51,prob=0.03) [1] 2.153694e-78

So the chance of getting more than 8 (9 or more) infections out of 51, if the infection probability is really 0.03 per injury, is about 2 in one hundred thousand. By the way, notice how small that last probability is. The chance of all 51 getting infected (if the infection probability is really 0.03 per injury) is the order of one in 10⁷⁸. This is something like the chance of marking a proton (if that were possible), losing it at random somewhere in the universe, and having a randomly chosen elementary particle turn out to be the one we marked (really). Sufficiently tiny probabilities are de facto impossibilities.

Let's linger a little more and look at some more binomial distributions. Let's look at a binomial probability function with N=10, but different values of p:

N=10, p=0.99

N=10, p=0.95

N=10, p=0.9

N=10, p=0.75

N=10, p=0.5

N=10, p=0.25

N=10, p=0.1

N=10, p=0.05

N=10, p=0.01

A couple of things to note: first, note that when p=0.99 and p=0.01, the probability function looks the same, just "turned around"; what you were calling successes becomes failures when you look at 1-p instead of p. Also, note that when p is 0.5, the function is perfectly symmetric (as it has to be; since p is equal to 1-p, it has to equal itself if it is "turned around"). By "turned around", we mean that the probability of getting x successes with p equals the probability of getting N-x successes with 1-p; using the binomial probability formula, this is not difficult to see, and we'll leave this as an exercise. Note also that the vertical axis is not the same in the plots I showed you.

I'd also like to take a look at several of these keeping p the same, but varying N.

> barplot(dbinom(0:3,size=3,prob=0.1)) > barplot(dbinom(0:10,size=10,prob=0.1)) > barplot(dbinom(0:30,size=30,prob=0.1)) > barplot(dbinom(0:100,size=100,prob=0.1)) > barplot(dbinom(0:300,size=300,prob=0.1)) > barplot(dbinom(0:1000,size=1000,prob=0.1))

N=3, p=0.1

N=10, p=0.1

N=30, p=0.1

N=100, p=0.1

N=300, p=0.1

N=1000, p=0.1

What do you notice here? Note that the way it is plotted, the peak of the distribution basically settles down and does not change much. Of course the numbers are changing quite a bit; you expect 100 successes out of 1000 trials, but 10 out of 100 (with p=0.1). But you expect about 10% of the trials to be a success, and in fact as you do more and more trials, you expect the fraction to be closer and closer to 10% (this is the Law of Large Numbers, and it's either the Weak or Strong law depending on what you mean by "expect"). Notice that the width of the distribution relative to the sample space is getting smaller and smaller. It's just as interesting that when the number of trials gets larger and larger, the distribution starts to take on the shape of a so-called "bell curve", by which is meant here the Normal (Gaussian) distribution. The convergence in distribution of the binomial to the Normal is one example of the Central Limit Theorem from statistics; normal approximation to the binomial is used quite a bit in applied statistics. As an exercise, repeat the plots I showed you just above for different values of p; use a for loop. Choose p=0.5, p=0.25, p=0.01, p=0.001. What do you notice happens; how does the convergence change as p gets closer and closer to 0?

Let's think about estimation a bit. If you don't know the success probability, but want to estimate it from data, we know our best estimate is the observed success fraction, X/N. This is traditionally denoted as a p with a circumflex on top of it. Unfortunately the most common HTML encoding doesn't include a p with this accent on it, although we can accent the vowels. So we'll break from tradition and just use a capital P to be X/N, the observed success fraction. We say that P estimates p, i.e. the observed success fraction estimates the unknown true success probability.

Now, what is the expected value of P? If we keep doing the binomial experiment over and over again, on average, what is P? Notice that N is constant each time; it is a fixed parameter. But X is random; each time we do the experiment, we get a different number of successes. We know E[P] = E[X/N] just by definition, and since N is constant, we can just move it out. So E[P]=E[X]/N (in general, if you (say) halve all the random values, you halve the average, and so forth). But the expected value of X is Np as we said. So E[P]=Np/N=p. The expected value of P is p; on average, you expect the estimator to equal the value it's estimating. This is a nice property the mathematicians call "unbiasedness". So the observed success fraction is an unbiased estimator of p.

The Weak Law of Large Numbers tells us something too. We know at least from simulation (that you just saw) that if N gets big, the success fraction is close to the probability. (In fact, this convergence is the basis of the very definition of probability as the frequentists see it.) What is the chance that there is a "meaningful" difference between P and p? If we pick some number like E, and ask what the chance that |P-p|>E, the Weak Law of Large Numbers tells us that we can make this chance as small as we like if we are willing to make N large enough. This fact makes P what the mathematicians call a consistent estimator of p.

I want to talk about an important property of P, the observed success fraction. Let's return for a moment to the experiment I discussed before, in which 3 infections were observed out of 51 needlestick injuries with Hepatitis C positive blood. In reality, we don't know the probability of infection from any first principles. And it would differ with such things as the size of the needle, depth of injury, and so forth. So this probability would probably differ in different populations. We believe that the observed success proportion 3/51=0.0588 is a good estimate of the unknown true probability.

Let's look at the probability of seeing 3 (exactly 3) infections out of 51 under different circumstances. Suppose the infection probability were really 0.01. Then

> dbinom(3,prob=0.01,size=51) [1] 0.01285507

we find that the chance of seeing what we saw would be about 1.3%. And if the unknown probability were 0.001, 1 in 1000, then we would have about a 2 in 100000 chance of seeing 3 out of 51. So just comparing these two possibilities, 0.01 seems like a much more plausible choice. If you had to pick between these possible values of the unknown infection probability, you'd pick 0.01 rather than 0.001, since the data we saw is more probable if the unknown value is 0.01 than it is if it is 0.001. This is an example of Fisher's likelihood principle.

> dbinom(3,prob=0.001,size=51) [1] 1.984853e-05

Now, let's plot the probability of seeing 3 infections out of 51, for various values of the infection probability p (the success probability in the binomial). We're picking one particular probability out of a distribution and plotting it as we change p, so of course this is not a probability distribution. It is the probability of the data (3) as a function of the unknown parameter p, and this is called the likelihood function:

> xs <- seq(0.001,0.999,by=0.001) > plot(xs,dbinom(3,prob=xs,size=51),type="l")

The particular choice of p which makes the likelihood as big as possible is the maximum likelihood estimate. By the way, I slipped something else in here: notice that dbinom is vectorized in the probability argument too. If you give it a vector of success probabilities, you get a vector of binomial probabilities computed using these.

Let's see how we could find this maximum value in R. We have a one-dimensional function that we want to find the maximum value of. In other words, conceptually, we have a function that takes a value of p, and gives us the probability of the data for that value of p. The function we will use is called optimize, and it is good for one-dimensional optimization. For multidimensional optimization, we use optim. So let's see this work:

> 3/51 [1] 0.05882353 > fn <- function(pp) dbinom(3,prob=pp,size=51) > optimize(fn,interval=c(0.0001,0.9999),maximum=TRUE) $maximum [1] 0.05882187 $objective [1] 0.2309134

First, we have the function we want to optimize, which we called fn. We chose to define it as a separate function, but we could have used the function anonymously as an argument to optimize. We also specified the interval to search over (we chose to avoid the endpoints); what could go wrong with the choice we made? Finally, we had to specify maximum=TRUE to get it to find the maximum, rather than the minimum. And notice that it returned a list as the output, and the elements were maximum, the value of the argument at which the maximum occurs, and objective, the actual maximum value of the function itself. Notice that the maximum it found was 0.05882187, not very different from 0.05882353, the value of 3/51.

It turns out that P, the observed success fraction, is in fact the true maximum likelihood estimate of the unknown success probability. This can be shown by taking the logarithm (to make the math easier) of the probability function, and setting the first derivative to zero, etc. We're not going to show that in this class. Just note that the use of optimize gave us a reasonably close approximation.

Now, we've already seen that if p were 0.001, there would be an about 2 in 100000 chance of seeing the data we saw. Is it plausible that the true probability could be as small as 0.001? So let's think about what we might see if p were 0.001. What would it take to change our mind? If p were 0.001, then the probability of seeing 0 infections is 0.9503, seeing 1 infection is 0.04851, of seeing 2 infections is 0.0012, etc. The idea is that if the success probability were small, you expect to see a small number of successes. If the infection probability is small, you should see a small number of infections. If you see a large enough number of infections, maybe you should change your mind about the probability being small.

Formally, we could call p=0.001 the null hypothesis. The data we will have is the number of infections out of the study population. We can calculate the fraction of successes (infections), P=X/N using the notation from before; under the null hypothesis, X has a binomial distribution with parameters p=0.001 and N=51. We know that if p is small, we had better see a small P. So a big P will lead us to reject the null hypothesis, and in this case, what we'll reject is p=0.001.

So how big is big? Remember, any big value will lead us to reject the hypothesis. We're interested in things like P(X>=x for various values of x (and here the capital P just refers to probability). So in other words, if some x is so big we'd give up on p=0.001, then something even bigger is even worse as far as the credibility of p=0.001 is concerned.

R conveniently computes the cumulative distribution function, P(X<=x) using pbinom. This saves us the trouble of using sum on a range. And there is an analogous version for many continuous variables like the normal (e.g. pnorm).

> pbinom(0,prob=0.001,size=51) [1] 0.9502544 > pbinom(1,prob=0.001,size=51) [1] 0.998766 > pbinom(2,prob=0.001,size=51) [1] 0.99998 > pbinom(3,prob=0.001,size=51) [1] 0.9999998 > pbinom(3,prob=0.001,size=51)-sum(dbinom(0:3,prob=0.001,size=51)) [1] 0

Let's write a function to compute P(X>=x) for binomial random variables. To be x or bigger is to not be x-1 or smaller; the binomial has no values in between integers:

pr.ge.binom <- function(xx,prob,size) { 1-pbinom(xx-1,prob=prob,size=size)

Of course this is done with evil subtraction; how would you rewrite it to avoid subtraction? What would the advantages and disadvantages of your solution be?

So how big would be too big? We could decide in advance that the smallest value of x so that P(X>=x) is less than or equal to 0.05 is going to be the cutoff. Let's take a look at some of this:

> pr.ge.binom <- function(xx,prob,size) { + 1-pbinom(xx-1,prob=prob,size=size) + } > for (xx in 0:10) { + cat("xx=") + cat(xx) + cat("; and P(X>=x)=") + cat(pr.ge.binom(xx,prob=0.001,size=51)) + cat("\n") + } xx=0; and P(X>=x)=1 xx=1; and P(X>=x)=0.04974558 xx=2; and P(X>=x)=0.001234090 xx=3; and P(X>=x)=2.008922e-05 xx=4; and P(X>=x)=2.406816e-07 xx=5; and P(X>=x)=2.260727e-09 xx=6; and P(X>=x)=1.732803e-11 xx=7; and P(X>=x)=1.113554e-13 xx=8; and P(X>=x)=6.661338e-16 xx=9; and P(X>=x)=0 xx=10; and P(X>=x)=0

Here, I used multiple calls to cat to construct the table (sometimes you find a version of R on Rweb that is old and that does not recognize cat with more than one argument.) So, let's see what all this means. We know that P(X>=0) had better be 1, just by definition. At the high end, R thinks that P(X>=9) is actually zero. Of course, it's not really zero; it's just so small that the true value underflowed. We can get a better idea of what it is if we do the computation additively, avoiding the subtraction:

> sum(dbinom(9:51,prob=0.001,size=51)) [1] 2.92943e-18

Well, that's pretty small anyway. If we get 9 out of 51 infections, there's no way in blazes I'm going to believe that the probability of infection is 0.001. And I won't believe it if I get 7 or more; the chance of that is only 1.113554e-13. And frankly the chance of getting 2 or more is only 0.001234090. If the true infection probability is 1 in 1000, there's only a 1.2 chance in 1000 you would see 2 or more infections out of 51. If I see even 2 infections, I'll reject the hypothesis that p=0.001. What about 1 infection? The chance of seeing one or more infections out of 51, if p=0.001, is 0.04974558.

If we're doing a one-tailed test with significance probability 0.05, then we would reject the hypothesis p=0.001 even if we see one single infection out of 51. If the probability really were 0.001, and we did this experiment of looking at 51 random independent subjects over and over again, rejecting the hypothesis whenever we see 1 or more infections, then we'll be wrong, in the long run, 4.97% of the time. If we wanted a smaller Type I error rate, such as 1%, we would have to stop rejecting the hypothesis if we see one single infection. To have a type I error rate no bigger than 1%, we would have to reject only when we see 2 or more infections (and our type I error rate is then what?).

Question: what if we believe that the true infection probability is smaller than 0.001, and we have 51 subjects. Why can't we ever conclude based on this sample size that 0.001 is too big?

Let's go back to 3 infections out of 51 injuries. We agree that 3/51=0.058 is the best estimate. But how small could the probability be, and be consistent with our observation? How big could it be? Let's say that we will use a two-sided test with 2.5% of the probability (or as close as we can) on each side.

We know that if our hypothesis were p=0.001, we would reject the hypothesis if we saw anything bigger than 2 (with a 0.12% error rate). But what about a bigger p? Let's look at a few:

> sum(dbinom(3:51,size=51,prob=0.01)) [1] 0.01457345 > sum(dbinom(3:51,size=51,prob=0.005)) [1] 0.002175924 > sum(dbinom(3:51,size=51,prob=0.02)) [1] 0.08214377 > sum(dbinom(3:51,size=51,prob=0.015)) [1] 0.0412478

I'd like to find the prob so that this sum (P(X>=3)) is 2.5%. Any probability smaller than that is just not consistent with there being at least 3 infections. But I don't want to hunt around for this value.

We can use the function uniroot to find numbers like this in one dimension:

> fn <- function(pp) { + sum(dbinom(3:51,size=51,prob=pp))-0.025 + } > uniroot(fn,interval=c(0.00001,0.99999)) $root [1] 0.01230370 $f.root [1] 2.407805e-05 $iter [1] 7 $estim.prec [1] 6.103516e-05 > 1-pbinom(2,size=51,prob=0.0123) [1] 0.02500476

So here, we defined a function fn that is zero at the value we're looking for: when the probability of getting 3 or more infections equals 0.025, this function returns zero. We want to know for what probability that occurs. We call uniroot as shown for this function, telling it to look in the interval we chose. It returned a list including a variety of interesting information, but most useful to us is the component root, which was approximately 0.0123. We tested this value and found that the chance of getting 3 or more infections is indeed about 2.5% if p=0.0123. So on the basis of seeing 3 infections, we would reject the hypothesis that p is 0.0123 or anything smaller than that. If p were smaller than that, it wouldn't be plausible that we got as many as 3 infections.

Here, we saw an example of root finding in action. Lots of things could go wrong with root finding: there might not be a root, or there might be more than one root in the interval, or the function might be irregular or strange. So you can never take this sort of thing for granted, even in one dimension (with one argument that can vary). The function uniroot is R's builtin function for helping you do one-dimensional root finding, for purposes like this one. When you use it, always plot your function and make sure you understand the behavior of the function whose root you want. Here is a plot of the function we just looked at:

> fn <- function(pp){1-pbinom(2,size=51,prob=pp)-0.025} > xx <- seq(0.001,0.02,by=0.001) > plot(xx,fn(xx),type="l") > abline(h=0.0)

Notice that the curve (fn) crosses the horizontal line at 0, right about at 0.012, just as advertised. (Notice here and before, we use type="l" to get a line plot. Also, notice that abline can be used to add a line to an existing plot; here, we added a horizontal line using h=).

One of the things uniroot does not do well is find roots where the sign does not change on either side:

> uniroot(function(xx)xx^2,c(-0.1,0.1)) Error in uniroot(function(xx) xx^2, c(-0.1, 0.1)) : f() values at end points not of opposite sign

Sometimes it can handle deranged functions:

> fn <- function(x1) { + xx<-x1-0.01 + ww <- 100*(abs(xx*sin(1/xx)))^2+ xx*xx + ww[is.na(ww)] <- 0 + ww[xx<0] <- (-1)*ww[xx<0] + ww + } > xs <- seq(-0.1,0.1,by=0.0001) > plot(xs,fn(xs),type="l") > abline(h=0) > uniroot(fn,c(-0.1,0.1)) $root [1] 0.009997432 $f.root [1] -4.6764e-10 $iter [1] 16 $estim.prec [1] 6.103516e-05

Numerical analysis texts are full of strange functions that can wreak havoc on your root finder.

So we used the root finder uniroot to find how small the true infection probability could be and still be in a sense consistent with 3 infections out of 51. We decided that we would reject the hypothesis that the infection probability was (say) 0.001, since if it were that small, there would be too small a chance of seeing as many as three infections. We solved to find out for what infection probability would the chance of seeing three or more infections reach 2.5%.

Now, let's go the other way. How big could it be and still be consistent with as few as three infections? We know that as the infection probability per injury gets bigger and bigger, the chance of getting only 3 or fewer infections should get quite small. Let's plot it:

> xx <- seq(0,0.2,by=0.001) > ww <- pbinom(3,prob=xx,size=51) > plot(xx,ww,type="l") > abline(h=0.025)

So by the time you get to 0.17 or so, the chance of seeing 3 or fewer infections becomes implausible in the sense that you'd reject the hypothesis that the infection probability was so large. Let's find the actual cutoff:

> uniroot(function(xx)pbinom(3,prob=xx,size=51)-0.025, + c(0.001,0.999)) $root [1] 0.1624314 $f.root [1] -9.51687e-06 $iter [1] 10 $estim.prec [1] 6.103516e-05

Here, we find it is about 0.162.

So what we have done is to compute the best estimate, 0.0588, together with an exact 95% confidence interval, (0.0123, 0.162). Confidence intervals determined by this method are called Clopper-Pearson exact confidence intervals¹.

We can construct a function to compute these intervals, as follows:

> cp.lower <- function(xx,nn,alpha=0.025,low=1e-6,hi=1-1e-6) { + uniroot(function(pp){1-pbinom(xx-1,size=nn,prob=pp)-alpha},c(low,hi))$root + } > cp.upper <- function(xx,nn,alpha=0.025,low=1e-6,hi=1-1e-6) { + uniroot(function(pp){pbinom(xx,size=nn,prob=pp)-alpha},c(low,hi))$root + } > cp.interval <- function(xx,nn,alpha=0.05,low=1e-6,hi=1-1e-6) { + c(cp.lower(xx,nn,alpha/2,low=low,hi=hi), + cp.upper(xx,nn,alpha/2,low=low,hi=hi)) + } > cp.interval(xx=3,nn=51) [1] 0.01230367 0.16243854

With the functions we just defined, we can actually compute one-sided upper confidence bounds too. What do you do when you find no successes? Suppose that out of 9 injuries, we found no infections. Based on just these data, our best estimate of the probability is the observed fraction 0/9, but we certainly can't just stop there. The fact that there are only nine subjects should mean that this estimate is not all that precise. So let's compute a 95% upper confidence limit, using the function above:

> # continuing... > cp.upper(0,9,alpha=0.05) [1] 0.2831307

So it's not a tremendously precise estimate.

The binomial confidence interval can actually be expressed in terms of an exact formula²:

> cp.interval.lt <- function(xx,nn,alpha=0.05,low=1e-6,hi=1-1e-6) { + # I'm keeping the arguments low and hi in place so this function will + # be interchangeable with the other + f1<-qf(alpha/2,df1=2*xx,df2=2*(nn-xx+1)) + f2<-qf(1-alpha/2,df1=2*(xx+1),df2=2*(nn-xx)) + b1 <- 1+(nn-xx+1)/(xx*f1) + b2 <- 1+(nn-xx)/((xx+1)*f2) + c(1/b1,1/b2) + } > cp.interval.lt(xx=3,nn=51) [1] 0.01229909 0.16242222

Here, we used the function qf to find quantiles of the F-distribution; we'll discuss quantile functions more next class.

For another example, consider the results of a study of hepatotoxicity and INH (isoniazid) preventive therapy for latent tuberculosis infection³. In Table 3, the investigators reported the rate of hepatotoxicity as a number of cases per 1000 persons starting therapy. Let's compute exact confidence intervals where appropriate, and where the rate is zero, let's compute a one-sided upper confidence limit. First, I'm going to shine up the confidence interval function in a few ways: I'm going to make it able to generate confidence intervals if you give a vector of alphas. And I'm going to make it be able to handle a vector of different possible numerators and denominators (but if you give it a vector of alphas and a vector of numerators or denominators, I'm going to make it complain). I'm also going to arrange for it to automatically report a 95% one-sided upper limit if the count is zero. Exercise: modify this to report 95% one-sided lower limit if the numerator equals the denominator. Finally, I'm going to report the result as a matrix rather than a list, to make it easier to scale the results by 1000 the way the authors did, and I'm also going to include the best estimate (the observed counts over the number of trials).

> age.numerators <- c(0,6,4,1) > age.denominators <- c(1468,7449,1865,359) > age.rates <- age.numerators/age.denominators > cp.interval.lt <- function(xx,nn,alpha=0.05,low=1e-6,hi=1-1e-6) { + alen <- length(alpha) + xlen <- length(xx) + nlen <- length(nn) + if (alen>1 && (xlen>1 || nlen>1)) { + stop("error--cp.interval.lt has wrong input lengths.") + } + if (alen>1) { + if (xx>0) { + f1<-qf(alpha/2,df1=2*xx,df2=2*(nn-xx+1)) + f2<-qf(1-alpha/2,df1=2*(xx+1),df2=2*(nn-xx)) + b1 <- 1+(nn-xx+1)/(xx*f1) + b2 <- 1+(nn-xx)/((xx+1)*f2) + lowers <- 1/b1 + uppers <- 1/b2 + } else { + f2<-qf(1-alpha,df1=2*(xx+1),df2=2*(nn-xx)) + b2 <- 1+(nn-xx)/((xx+1)*f2) + lowers <- rep(0,alen) + uppers <- 1/b2 + } + } else { + inputs <- rbind(xx=xx,nn=nn) + nc <- ncol(inputs) + xx <- inputs["xx",] + nn <- inputs["nn",] + lowers <- rep(NA,nc) + uppers <- rep(NA,nc) + for (ii in 1:nc) { + if (xx[ii]>0) { + f1<-qf(alpha/2,df1=2*xx[ii],df2=2*(nn[ii]-xx[ii]+1)) + f2<-qf(1-alpha/2,df1=2*(xx[ii]+1),df2=2*(nn[ii]-xx[ii])) + b1 <- 1+(nn[ii]-xx[ii]+1)/(xx[ii]*f1) + b2 <- 1+(nn[ii]-xx[ii])/((xx[ii]+1)*f2) + lowers[ii] <- 1/b1 + uppers[ii] <- 1/b2 + } else { + lowers[ii] <- 0 + f2<-qf(1-alpha,df1=2*(xx[ii]+1),df2=2*(nn[ii]-xx[ii])) + b2 <- 1+(nn[ii]-xx[ii])/((xx[ii]+1)*f2) + uppers[ii] <- 1/b2 + } + } + } + rbind(uppers=uppers,lowers=lowers,estimate=xx/nn) + } > cps <- 1000*cp.interval.lt(age.numerators,age.denominators) > cps [,1] [,2] [,3] [,4] uppers 2.038609 1.7523542 5.4822852 15.42143959 lowers 0.000000 0.2956515 0.5846777 0.07052066 estimate 0.000000 0.8054772 2.1447721 2.78551532 > gen.ci.plot <- function(cps,xaxis,xticks) { + mx <- max(cps) + ngrp <- ncol(cps) + plot(1:ngrp,cps["estimate",],col="red",ylim=c(0,mx), + xlab=xaxis, + ylab="Rate per 1000 Persons starting therapy",type="n",axes=FALSE) + axis(1,at=1:ngrp,labels=xticks) + axis(2) + box() + points(1:ngrp,cps["uppers",],col="blue") + points(1:ngrp,cps["lowers",],col="blue") + points(1:ngrp,cps["estimate",],col="red") + for (ii in 1:ngrp) { + lines(x=c(ii,ii),y=cps[c("lowers","uppers"),ii],col="black") + } + cps + } > age.cps <- gen.ci.plot(cps,"Patient Age",c("0-14","15-34","35-64","65+")) > age.cps <- gen.ci.plot(cps,"Patient Age",c("0-14","15-34","35-64","65+")) > eth.cps <- gen.ci.plot(1000*cp.interval.lt(c(4,7,5,2,0,0),c(1856,9285,5968,1732,1050,535)),"Ethnicity",c("\"White\"","\"Nonwhite\"","\"Asian\"","\"Black\"","\"Hispanic\"","\"Other\"")) > sex.cps <- gen.ci.plot(1000*cp.interval.lt(c(3,8),c(6066,5075)),"Sex",c("Male","Female")) > age.cps [,1] [,2] [,3] [,4] uppers 2.038609 1.7523542 5.4822852 15.42143959 lowers 0.000000 0.2956515 0.5846777 0.07052066 estimate 0.000000 0.8054772 2.1447721 2.78551532 > eth.cps [,1] [,2] [,3] [,4] [,5] [,6] uppers 5.5088248 1.5527099 1.9540574 4.1650110 2.849012 5.583852 lowers 0.5875144 0.3031606 0.2720861 0.1398743 0.000000 0.000000 estimate 2.1551724 0.7539041 0.8378016 1.1547344 0.000000 0.000000 > sex.cps [,1] [,2] uppers 1.4446268 3.103672 lowers 0.1020017 0.680796 estimate 0.4945598 1.576355

To do the graph, I wanted the axes to be labeled a little differently than the default. The example shows how to arrange to have only 4 tick marks, and to make sure that the text labels correspond to the groups in the paper. We suppressed the default axes using axes=FALSE in the plot command, made the lower (1) axis our way with the first axis command, set up the default left axis with axis(2), and had to explicitly draw the box around the plot with box(). I used a plot command that just set up the axes first, and then added the points, making sure to add the actual estimates in red last (so that when the estimate coincided with the lower confidence bound at zero, the point would be red: the red would be on top of the blue). Then finally I drew the lines joining the top and bottom of the interval in black using a series of calls to lines in a loop as shown. I chose to include all this in a function since I will actually repeat this again. For brevity I only include the first two of these plots; an additional feature of the plot function might be to allow the user to specify a common y-range for both plots to make them comparable. Finally note that I plotted all six ethnicity groups together on the same graph without indicating that the last four are in fact simply the breakdown of the second.

Exercise. Use the manual page (?text) regarding the function text to figure out how to print the number of subjects at the top of each confidence bar.

It might also be useful to see the probability of getting 0 cases out of the 1468 individuals aged 0-14. Here is how to plot such a graph. In this case, I color the line red, so it will stand out from the dotted lines I used to draw the lines indicating where 5%, 10%, and 20% were (what true hepatotoxicity rates corresponded to a 5%, 10%, and 20% chance of seeing no cases in the patients). How do these values relate to the one-sided confidence limits for different type-I error rates?

npatients <- 1468 > xscale <- 1000 > xs <- seq(0,0.005,by=0.00005) > ps <- dbinom(0,size=npatients,prob=xs) > plot(xscale*xs,ps,type="l",xlab=paste("True Rate of Hepatotoxicity as Cases per",xscale,"starting therapy"),ylab=paste("Probability of zero cases in",npatients,"patients"),col="red") > for (pct in c(0.05,0.1,0.2)) { + loc <- uniroot(function(xx)(dbinom(0,size=npatients,prob=xx)-pct),c(0,0.1),tol=1e-6) + cury <- dbinom(0,size=npatients,prob=loc$root) + lines(x=xscale*rep(loc$root,2),y=c(-0.1,cury),lty=2) + lines(x=xscale*c(-0.14,loc$root),y=rep(cury,2),lty=2) + curx <- xscale*loc$root + text(curx,cury+0.05,as.character(round(curx,digits=2))) + }

Observe the use of text to place the labels on the figure, the use of round to round the labels off to two decimal places, and the use of the tol named argument to uniroot to get more precision in the estimate.

Exercise. Find the 99% upper confidence limit using the same procedure as above. Ans. 3.13.

Random Numbers

We could simulate the needlestick incident 100 times, assuming that the probability of needlestick is in reality 3/51:

> rbinom(100,size=51,prob=3/51) [1] 5 4 2 0 2 7 0 2 1 3 4 2 4 4 6 3 6 4 3 3 3 3 2 0 4 1 5 6 1 3 1 5 3 1 3 1 8 [38] 3 2 4 3 4 3 6 2 4 2 4 3 3 5 4 4 2 4 4 2 4 2 1 1 1 4 4 4 3 4 1 1 5 2 0 4 1 [75] 3 7 2 5 2 3 0 2 2 6 0 5 4 4 2 2 3 2 3 1 2 2 3 0 4 4

That's all for today. Next time we'll continue with examples.

¹ Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26:404-413, 1934.

² Leemis LM, Trivedi KS. A comparison of approximate interval estimators for the Bernoulli parameter. American Statistician 50:63-68, 1996.

³ Nolan CM, Goldberg SV, Buskin SE. Hepatotoxicity associated with Isoniazid preventive therapy. A 7-year survey from a public health tuberculosis clinic. Journal of the American Medical Association 281(11):1014-1018, 1999.