More on Function Calls

We have seen several examples of function calls with named arguments. We'll now look at this in more detail.

For example, you have seen the named argument used in calls to paste:
> paste("San","Francisco") [1] "San Francisco" > paste("San","Francisco",sep="-") [1] "San-Francisco" > paste("Doe"," John",sep=",") [1] "Doe, John" > paste("stick the","se directly together with no separator",sep="") [1] "stick these directly together with no separator"
Here, we used the named argument sep to specify the value of the separator used in pasting strings together. Observe that the separator can be an empty string, in which case the strings are directly concatenated.

We also used the named argument with seq:
> seq(0,1,by=0.1) [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
The seq function also takes a length argument if you wish:
> seq(0,1,length=10) [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667 [8] 0.7777778 0.8888889 1.0000000
Here, to have a list of length 10 we had to go up by 1/9 each step; if we wanted to go up by 0.1, then we made a fencepost error.

Another example of the named argument was in the random number generating functions, which we'll take a closer look at later this lecture.

Now it's time to go into the matter of named arguments a little more closely. We're going to use as an example our needle reuse function from before:

> needle <- function(answers,curtime,endtime,pp,qq) {

+    if (curtime >= endtime) {

+       answers

+    } else {

+       xx <- answers[curtime]

+       newx <- (1-qq)*(xx+(1-xx)*pp)

+       needle(c(answers,newx),curtime+1,endtime,pp,qq)

+    }

+ }

Remember that the first argument was zero, which was the initial contamination probability when the needle was new. The second argument is when we start computing (on the first use). The third argument is how many data points (needle reuses) we want to look at, the fourth argument pp is the contamination probability when a needle is used (perhaps the prevalence of infection among the population the needle is being reused on), and the fifth argument is the chance the needle gets decontaminated before reuse. We assumed independent reuses, so in the long run the chance the needle is contaminated is the result of the pressure (so to speak) of the opposing forces of contamination pp and decontamination qq. We assumedthe needle would last forever, and the result of the mathematics we talked about when developing this very simple model is an example of a Markov chain.

Before going further, did it irritate you to have to remember what pp and qq were? Which is contamination, and which is decontamination? The variable names aren't very meaningful, are they? So stylistically, we should have chosen better names. We should also document the program with suitable comments. Finally, just to make a point, I changed curtime to time.start, and endtime to time.end:

# Needle reuse simulation, simple Markov chain, R 1.6.1 # answers should begin as a vector with the initial contamination probability # time.start is the first use (usually 1) # time.end is the number of uses to simulate # contam.prob is the chance the needle will become contaminated each use # decon.prob is the chance the needle will be decontaminated before each use needle <- function(answers,time.start,time.end,contam.prob,decon.prob) { if (time.start >= time.end) { answers } else { xx <- answers[time.start] newx <- (1-decon.prob)*(xx+(1-xx)*contam.prob) needle(c(answers,newx),time.start+1,time.end,contam.prob,decon.prob) } }

While we're at it, we might as well start keeping our programs in files and loading them in. This will be demonstrated in lecture for Windows. We normally use the function source when working from the command line; the GUI versions of R have drop-down items for this. Assume we have sourced in the file above using whatever its path name is on your system:

> source("needle.r") > needle(0,1,10,0.2,0.1) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294
So this works fine.

Now, let's try it with a named argument:

> source("needle.r") > needle(answers=0,1,10,0.2,0.1) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294
This worked fine also. To use named arguments, we enter the name of the argument as it appeared in the function definition, the single equal symbol =, and then the value you want bound to the argument when the function body is evaluated. In the example we just did, we bound the value of answers to the value 0 when the function was evaluated.

Let's do more of these calls:

> source("needle.r") > needle(answers=0,time.start=1,time.end=10,contam.prob=0.2,decon.prob=0.1) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294 > needle(0,time.start=1,time.end=10,contam.prob=0.2,decon.prob=0.1) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294
So you may name any or all of the arguments. Not only that, but with named arguments, you can place the arguments in any order, and the right value will be associated with the right argument:

> #continuing... > needle(time.start=1,answers=0,decon.prob=0.1,time.end=10,contam.prob=0.2) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294
So this is very nice: it makes the function calls easier to read, and it keeps you from mixing the arguments up and matching the wrong value with the wrong position. You should take advantage of this feature whenever possible.

It is an error to list a name you don't match:

> #continuing... > needle(curtime=1,answers=0,decon.prob=0.1,time.end=10,contam.prob=0.2) Error in needle(curtime = 1, answers = 0, decon.prob = 0.1, time.end = 10, contam.prob = 0.2) : unused argument(s) (curtime ...)
Here, we don't have a curtime argument any more.

It's also an error to list a name twice:

> #continuing... > needle(time.start=1,answers=0,time.start=1,decon.prob=0.1,time.end=10,contam.prob=0.2) Error in needle(time.start = 1, answers = 0, time.start = 1, decon.prob = 0.1, time.end = 10, contam.prob = 0.2) : formal argument "time.start" matched by multiple actual arguments
Notice that it didn't matter that we used the same value (1) each time for time.start. You just can't use a name twice.

Another interesting feature of named argument matching is that it can handle partial matches. Some people find this useful:

> #continuing... > needle(time.start=1,ans=0,decon.prob=0.1,time.end=10,contam.prob=0.2) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294
Here, we used ans instead of answer. But because ans matches the first three letters of answer, R knows what you mean and the value of 0 is matched to the argument answer. It is an error to have an argument partially match two different things:

> #continuing... > needle(time.star=1,ans=0,decon.prob=0.1,time=10,contam.prob=0.2) Error in needle(time.star = 1, ans = 0, decon.prob = 0.1, time = 10, contam.prob = 0.2) : formal argument "time.start" matched by multiple actual arguments
Here, time.star and time were both partial matches for time.start. But had we used a complete match for time.start, this would have worked:

> #continuing... > needle(time.start=1,ans=0,decon.prob=0.1,time=10,contam.prob=0.2) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294
What happens is that all the full matches are handled first, and then all the partial matches. In the last example, time.start had already been matched by the time the partial matches for time were processed. And only then are unnamed arguments matched to the remaining arguments by position.

> #continuing... > needle(time.start=1,ans=0,0.1,time=10,contam.prob=0.2) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294
Here, it worked because first, time.start and contam.prob were exact matches, and were processed. Then the partial matches ans and time bound their values to answer and time.end respectively. This left only the unnamed argument 0.1 to be bound to the remaining argument, decon.prob. Now if we do this:

> #continuing... > needle(time.start=1,ans=0,0.1,time=10,0.2) [1] 0.0000000 0.0800000 0.1376000 0.1790720 0.2089318 0.2304309 0.2459103 [8] 0.2570554 0.2650799 0.2708575
we get a different answer, because the first unnamed argument (0.1) must match the first remaining unmatched argument, which is contam.prob, and the second unnamed argument (0.2) must match the second remaining unmatched argument, which is decon.prob. So 0.1 was the contamination probability and 0.2 was the decontamination probability.

There are remaining subtleties in function calls when the entity ... is used to match multiple arguments to create a function that accepts a variable number of arguments. We'll take a look at this later this course or in the next course.

Default argument values

When you define a function, you may specify default argument values. This way, if you leave the argument unspecified, the system will give an automatic value. Positional evaluation and default values can lead to puzzling behavior if not used carefully. Let's redefine the needle reuse example to give an automatic default to the answer variable, and to time.start:

# Needle reuse simulation, simple Markov chain, R 1.6.1 # answers should begin as a vector with the initial contamination probability # default value 0 (needle can't be contaminated if it is new) # time.start is the first use (usually 1) # default value 1 (start at the first use) # time.end is the number of uses to simulate # contam.prob is the chance the needle will become contaminated each use # decon.prob is the chance the needle will be decontaminated before each use needle <- function(answers=0,time.start=1,time.end,contam.prob,decon.prob) { if (time.start >= time.end) { answers } else { xx <- answers[time.start] newx <- (1-decon.prob)*(xx+(1-xx)*contam.prob) needle(c(answers,newx),time.start+1,time.end,contam.prob,decon.prob) } }
Now we can call it like this:

> #continuing... > needle(contam.prob=0.2,decon.prob=0.1,time.end=10) [1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981 [8] 0.5783746 0.5964297 0.6094294
My advice is to try to avoid mixing default values and positional calls; if you use default values, try to call the function with named arguments.

Surprisingly, default arguments are evaluated inside the function ("in the evaluation frame of the function"). So, for example:

> example.fn <- function(aa=2,bb=aa) { + cat("aa=",aa,"and bb=",bb,"\n") + aa + } > aa <- 3 > example.fn(4) aa= 4 and bb= 4 [1] 4 > example.fn() aa= 2 and bb= 2 [1] 2 > example.fn(aa=aa) aa= 3 and bb= 3 [1] 3
What is important to note here is that the argument bb defaults to whatever the value of aa is after the function is called. Whatever value aa gets for the function call, bb defaults to that value, and not to the value of aa when the function is called. So even though aa is bound to 3 in the calling environment, when the function is called the first time, aa is bound to 4 and so bb is also bound to 4.

When defining functions with default values, don't try to refer to symbols with the same name:

example.fn <- function(aa=aa,bb=999) { + cat("aa=",aa,"and bb=",bb,"\n") + aa + } > aa <- 3 > example.fn(4) aa= 4 and bb= 999 [1] 4 > example.fn() Error in cat("aa=", aa, "and bb=", bb, "\n") : recursive default argument reference
Remember, default function values are evaluated in the function (in the evaluation frame of the function). And in this example, aa is the thing you're defining; you're assigning its value inside the function in terms of its value inside the function. It doesn't make sense, and the system reports an error. Notice that the error does not happen at the time you define the function, but at the time you attempt to call it using the default.

Contrast the previous example of using the = symbol to define a default value in the function definition with the following use of = in a function call:

> example.fn <- function(aa,bb) { + cat("aa=",aa,"; bb=",bb,"\n") + aa + } > aa <- 4 > example.fn(aa=aa,bb=5) aa= 4 ; bb= 5 [1] 4
This is fine since you are not defining a value, but calling the function.

It is also important to realize that the default value is actually created the first time the symbol is used. This is a useful feature, but it can surprise you if you're not ready for it:

> example.fn <- function(aa={cat("used a's default!\n"); 99},bb=2) { + cat("aa=",aa,"; bb=",bb,"\n") + aa + } > example.fn(4,5) aa= 4 ; bb= 5 [1] 4 > example.fn(bb=8,aa=2) aa= 2 ; bb= 8 [1] 2 > example.fn(aa=4) aa= 4 ; bb= 2 [1] 4 > example.fn(bb=8) used a's default! aa= 99 ; bb= 8 [1] 99
What happened? The initial value for aa was the compound statement {cat("used a's default!\n"); 99}, which first prints the message "used a's default!" and then evaluates 99. The value of the compound statement is 99, so when the default value of aa is needed, 99 will be the value. But before the 99 is used, the message is printed.

Let's do another example. Remember: the default value is evaluated in the frame of the function, and at the time the variable's value is first needed. So watch this:

example.fn <- function(aa=ww) { + cat("Entering example function: ww=",ww,"\n"); + ww <- 50; + cat("Now printing ww again:", ww,"\n"); + cat("Now printing aa:",aa,"\n") + aa + } > example.fn(4) Entering example function: ww= 100 Now printing ww again: 50 Now printing aa: 4 [1] 4 > example.fn() Entering example function: ww= 100 Now printing ww again: 50 Now printing aa: 50 [1] 50 > ww [1] 100
What happened? The first time, aa got the value 4 because 4 was the argument of the function that matched aa. What happened to ww did not matter to aa; we first printed ww, and got the value 100 from the outside of the function. After we assign a local ww for inside the function the value of 50, we get 50 when we print the value of ww inside the function. After we are finished calling the function, when we evaluate ww outside the function we get the value 100 again. The value of 50 is lost once we end the function call. But through all this, aa just keeps the value 4 from the function call. The variable aa gets evaluated for the first time on the fourth line of the function body, and its value is 4.

But now look at the next call in the above example. Here, we did not give an argument to the function, so the default must be used. And the default value for aa is ww. Note that the first thing we do is to evaluate ww, and as before we first get the global value of 100. Then we assign 50 to our own local ww, and print that. Finally, we evaluate aa aa for the first time on the fourth line of the function body. (Of course, we mean for the first time during this call; what happened last time we called the function is gone and doesn't count now.) And it is right then, at the first evaluation of aa, that aa is assigned the value of ww, evaluated inside the frame of the function. At that time, the result of evaluating ww inside the function is 50, and so aa gets the value 50. Of course, once we exit the function, ww is 100 again, because the local value of 50 is lost after exiting the function.

Exercise. Without running this code, what result is printed where the question marks appear? Why?

> example.fn <- function(aa=2*bb,bb=3*cc,cc=4*xyz) { + xyz <- 1 + aa + } > example.fn(4) [1] ??? > example.fn() [1] ???

At this point we have covered the essential features of the language, but we will discuss more in the next course.

Other useful functions

We will also discuss the ifelse functions, as well as the sorting and ranking functions. We will take a further look at random number generation.