The Conditional Loop

We've seen now two mechanisms for sequential repetition: recursive function definitions, and the counted (for) loop. Today we will learn a third important mechanism: the conditional (while) loop.

The while loop is used for repeating an action as long a condition is true (until a condition is false).

As an example, let's toss a coin until it comes up tails, and keep track of how long this takes.

> ntrials <- 1

> while (sample(c("H","T"),1,replace=TRUE)=="H") {

+ ntrials <- ntrials+1

+ }

>

Here, we begin the loop with the keyword while, followed by a boolean expression in parentheses. Then, there follows an action to be repeated, typically a compound expression in curly braces representing a sequence of actions. We'll talk more about the curly braces later, but you can see the pattern above. In the coin example you just saw, the boolean expression drew a random coin toss and tested to see whether it was a head. The body of the loop simply added one to the number of trials. Note that before we begin the loop, we initialized the variable ntrials to equal one.

So how does this work? The first thing that happens is that ntrials is set to one. Then, we enter the while loop. First, the loop condition is tested: we evaluate the boolean expression in the parentheses. In this case, we get true if we drew a head and false if we drew a tails on the coin toss. If we got a head, then the boolean expression is true and the loop body is entered. The statements in the body are evaluated; here, ntrials is set to its former value (1) plus one, so at the end of the loop body, ntrials is two. But if we got a tails, the boolean expression is false and the loop body is not entered; the loop is ended and control resumes at the bottom of the loop. At the end of the loop, the variable ntrials will equal the number of times we tossed the coin.

In this case, the variable ntrials will obey a geometric distribution and since it is possible to toss a head, the loop will eventually terminate with probability one. But of course in more complex problems, you might make a mistake, and the boolean expression might return true forever. The loop would not terminate normally and what would happen would depend on your system.

> # Here is an infinite loop
> # Don't run this loop!

> while (TRUE) {

+ cat("I'm an infinite loop!\n")

+ }

>

Here, the condition is just TRUE all the time. This is a nonterminating loop and unless you're doing something special like writing a controller for an embedded system, nonterminating loops are evil.

By the way, I slipped in a comment in the previous example. A comment is just a note to the reader to help document the code. Anything that follows a sharp sign (#) to the end of the line is a comment; the R system will simply ignore it. Any R program of any real usefulness will require comments, and it is your responsibility to keep the comments up to date and correct.

This is about all there is to the conditional loop. You could use a conditional loop instead of a counted loop:

> ii <- 1

> while (ii <= 1000000) {

+ cat("ii=",ii,"\n")

+ ii <- ii + 1

+ }

>

Here, we used a while loop to iterate through one million numbers. The advantage is that we avoid having to construct the object 1:1000000, which we would need in the for loop. Of course, if we forget the statement ii <- ii+1 inside the loop, the loop will be nonterminating. And that is bad.

But R provides a statement called break to immediately exit a loop. Here is how it works:

> ii <- 1

> while (TRUE) {

+ cat("ii=",ii,"\n")

+ ii <- ii + 1

+ if (ii>4) { break }

+ cat("At the end...")

+ }

>

When the break statement is encountered, the loop terminates. In fact, sometimes it is convenient to just run a loop and have it keep going until you stop it with a break statement from inside. R provides a repeat loop which behaves like while (TRUE), and we'll discuss it more some other time.

Another useful thing to know about is the next statement. When you execute a next statement inside a loop body, you skip the rest of the loop body and jump to the beginning of the next cycle of the loop. We'll see examples of this later.

Here, we nest two loops. The outer loop is really a counted loop, using the loop variable ii. The inner loop will terminate whenever the break command is executed, and this happens whenever jj exceeds four. But because the inner loop is located inside the outer loop, it will be executed again, until the outer loop terminates.

> ii <- 1

> while (ii < 6) {

+ ii <- ii+1

+ while (TRUE) {

+ cat("ii=",ii,"; jj=",jj,"\n")

+ jj <- jj + 1

+ if (jj>4) { break }

+ cat("At the end...")

+ }

+ }

>

Nested loops are a common way to do something for all combinations of certain variables. Remember we called this the outer pattern. Let's imagine we have four states and five numbers, and we want to print all possible pairs. We can do this easily with nested loops:

> for (ii in c("CA","NV","AZ","TN")) {

+ for (jj in 1:5) {

+ cat("ii=",ii,"; jj=",jj,"\n")

+ }

+ }

>

Lists

The next thing we need to learn about are the heterogeneous collection structures that R provides. Today we will discuss the list, and next time the data frame.

A list is quite similar to a vector, except that a list can contain objects of different types, but a vector cannot. And the elements of a list are accessed differently.

Let's create a simple list, using the constructor function list:

> new.list <- list("CA",2,TRUE,81.0)
>new.list
[[1]]
[1] "CA"

[[2]]
[1] 2

[[3]]
[1] TRUE

[[4]]
[1] 81.0

What does all this mean?

Note that some of what is printed looks very much like a vector of length one. For instance, we have seen things like this before: [1] "CA". Notice that we see four of these vectors of length one, and each begins with [1]. But each of the four begins with something different, the number indicated in double brackets. As is often the case in R, the way the result is printed is a clue as to how to access the item.

In this case, we have created a list with four items in it, and we can access the items by numbers using double brackets:

# continuing above example...
> new.list[[1]]
>[1] "CA"
> is.character(new.list[[1]])
>[1] TRUE
> is.numeric(new.list[[2]])
>[1] TRUE

Note that each item in the list has its own type. The character data stays character, the numberic data stays numeric, and the boolean/logical data stays boolean/logical. The data are not coerced to a more general type to create a homogeneous collection. The list in R is a heterogeneous collection.

Lists can contain longer vectors. These are not flattened out:

> another.list <- list(c(1,2,3),"CA",c(TRUE,TRUE))
> another.list[[1]]
>[1] 1 2 3
> is.character(another.list[[1]])
>[1] FALSE
> another.list[[2]]
>[1] "CA"

The length function will tell you how many items there are in a list:

> another.list <- list("CA",1,2,TRUE,1:100)
> length(another.list)
>[1] 5

The fifth element of the list I just made is itself a vector of length 100. That entire vector is simply the fifth element of the list.

Lists can nest inside each other also:

> another.list <- list("CA",2,list(1:10,"NY"))
> another.list
[[1]]
[1] "CA"

[[2]]
[1] 2

[[3]]
[[3]][[1]]
[1] 1 2 3 4 5 6 7 8 9 10

[[3]][[2]]
[1] "NY"

So you can create complex, hierarchical data structures if you wish.

Lists can even contain functions, unlike vectors:

> zz <- list("CA",sqrt)
> zz[[2]](4)
[1] 2

You can name items in a list as well:

> the.list <- list(state="CA",num=4)
> the.list[["state"]]
[1] "CA"
> the.list$state
[1] "CA"
> the.list$num
[1] 4
> second.list <- list(num=9,fn=sqrt)
> second.list$fn(25)
[1] 5

Note that you may use the dollar sign to identify the elements from the list.

A common use for lists is to return more than one value from a function. The last expression evaluated in the body of a function is the function's value. What if you want to return a string and a number? You make sure the last expression evaluated is a list constructor:

> simple.example <- function() {
+ achar <- sample(c("AZ","CA","NV"),1,replace=TRUE)
+ anum <- rnorm(1)
+ list(achar,anum)
+ }
> zz <- simple.example()
> zz
[[1]]
[1] "CA"

[[2]]
[1] -0.239634
>

This is an extremely common idiom.

How about a wilder example? Let's pick a number at random from one to three, and return a random function, together with a random number:

> example.fn <- function() {
+ flist <- list(sqrt,exp,function(x){x^2})
+ ind <- sample(1:length(flist),1,replace=TRUE)
+ anum <- rnorm(1)
+ list(fn=flist[[ind]],num=anum)
+ }
> zz <- example.fn()
> zz
$fn
function (x)
x^2

$num
[1] -0.7988386
>

Note how a list with named elements is printed. It again tries to remind you of how to access the elements.

Here is another way to write the function:

> example.fn <- function() {
+ flist <- list(sqrt,exp,function(x){x^2})
+ list(fn=flist[[sample(1:length(flist),1)]],num=rnorm(1))
+ }
> zz <- example.fn()
>

You may not select more than one element of a list using a vector subscript. A vector subscript is interpreted hierarchically when you work with a list. So a subscript like c(1,2) is the second element of the first element in the list:

> zlist <- list(1:3,1:5,c("AZ","CA"),1:2)
> zlist[[c(1,2)]]
[1] 2
> zlist[[c(3,1)]]
[1] "AZ"
> zlist[[c(4,4)]]
Error in zlist[[c(4,4)]] : subscript out of bounds
>

This can be quite convenient, but it is very different from the way a vector is indexed. And notice that for a list, it insists that you combine all the subscripts together into a vector too:

> zlist <- list(1:3,1:5,c("AZ","CA"),1:2)
> zlist[[1,2]]
Error in zlist[[1,2]] : incorrect number of subscripts
>

So R's lists can be used for many things. They can be used to package together multiple elements for return from a function. They can also be used to group together related elements to produce a composite data structure representing an object. And they can be used to create hierarchical data structures; we won't work with this sort of thing much in this semester, but such structures can be indispensable.

Next time we will learn some of the built in tools for working with lists, especially lapply for applying a function to every element of a list. We'll also learn about the data frame and a few other specialized data structures R provides. Finally we'll talk about the switch statement and this will conclude our overview of the basic structures of the language.