Home About us Mathematical Epidemiology Rweb EPITools Statistics Notes Web Design Search Contact us |

> Home > Computational Epidemiology Course > Lecture 12 (in progress)

A q-quantile is defined to be the smallest *x* such that the chance of
a random variable being greater than or equal to *x* is at least *q*.

The meaning of this is easier to see in the concrete. Let's look back at the binomial again. Let's first plot the cumulative distribution function for the needle example, where the number of trials was 51 and the success probability 0.03:

> plot(plot(0:10,pbinom(0:10,size=51,prob=0.03),ylim=c(0,1)) > abline(h=0.25) > pbinom(0:10,size=51,prob=0.03) [1] 0.2115234 0.5451634 0.8031325 0.9334468 0.9818109 0.9958714 0.9992053 [8] 0.9998681 0.9999809 0.9999976 0.9999997 |

Notice that we set the vertical range to (0,1) manually by using ylim, and then placed a horizontal line in the figure using

> qbinom(0.25,size=51,prob=0.03) [1] 1 |

So the purpose of the quantile function is to tell you where to draw the cutoffs to make sure you have a certain amount of the distribution on the left hand side. This is quite easy for a continuous random variable like the standard normal:

> qnorm(0.025) [1] -1.959964 > qnorm(c(0.005,0.01,0.025,0.05,0.1,0.5,0.9,0.95,0.975,0.99,0.995)) [1] -2.575829 -2.326348 -1.959964 -1.644854 -1.281552 0.000000 1.281552 [8] 1.644854 1.959964 2.326348 2.575829 |

R has built in quantile functions for all sorts of useful distributions that come up in statistics, eliminating the need for tables:

> qchisq(0.95,df=1) [1] 3.841459 > qt(0.025,df=10) [1] -2.228139 > qf(0.95,df1=3,df2=37) [1] 2.858796 |

Last time we spent some time talking about "exact" confidence intervals for
the Binomial distribution. We observe a certain number *X* of successes in
*N* independent idential bernoulli trials, and compute the observed
frequency of success, *X/N*. And this value *X/N* estimates the
true success probability. We found how to compute an interval estimate of this
using the binomial distribution. Every time we do the experiment, we should
get a different number of successes in principle, since this is a random
variable, and therefore we should get a different confidence interval. The
confidence interval is a random variable, and the idea is that it should have
a certain chance of containing the true value. Let's simulate this process,
as follows:

> cp.interval.lt <- function(xx,nn,alpha=0.05,low=1e-6,hi=1-1e-6) { + f1<-qf(alpha/2,df1=2*xx,df2=2*(nn-xx+1)) + f2<-qf(1-alpha/2,df1=2*(xx+1),df2=2*(nn-xx)) + b1 <- 1+(nn-xx+1)/(xx*f1) + b2 <- 1+(nn-xx)/((xx+1)*f2) + c(1/b1,1/b2) + } > contains.val <- function(cis,a.value) { + (a.value > cis$lower) & (a.value < cis$upper) + } > gen.intervals <- function(xxs,size,nvals=length(xxs)) { + lowers <- rep(NA,nvals) + uppers <- rep(NA,nvals) + for (ii in 1:nvals) { + intvl <- cp.interval.lt(xxs[ii],size,alpha=0.05) + lowers[ii] <- intvl[1] + uppers[ii] <- intvl[2] + } + lowers[is.nan(lowers)] <- -1 + list(lowers=lowers,uppers=uppers) + } > nn <- 20000 > nsims<-10000 > tru <- 0.5 > sum(contains.val(gen.intervals(rbinom(nsims,size=nn,prob=tru),nn), tru)) [1] 9535 |

> # continuing example from above > sum(contains.val( + gen.intervals(rbinom(nsims,size=nn,prob=tru),nn), tru + )) [1] 9805 There were 50 or more warnings (use warnings() to see the first 50) Rweb:> warnings() Warning messages: 1: NaNs produced in: qf(p, df1, df2, lower.tail, log.p) |