under construction; last updated November 4, 2003
> Home > Computational Epidemiology Course > Lecture 10 (in progress)
More on Function Calls
We have seen several examples of function calls with named arguments.
We'll now look at this in more detail.
For example, you have seen the named argument used in calls to paste:
> paste("San","Francisco")
[1] "San Francisco"
> paste("San","Francisco",sep="-")
[1] "San-Francisco"
> paste("Doe"," John",sep=",")
[1] "Doe, John"
> paste("stick the","se directly together with no separator",sep="")
[1] "stick these directly together with no separator"
|
Here, we used the named argument sep to specify the value of
the separator used in pasting strings together. Observe that the separator
can be an empty string, in which case the strings are directly concatenated.
We also used the named argument with seq:
> seq(0,1,by=0.1)
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
|
The seq function also takes a length argument if you wish:
> seq(0,1,length=10)
[1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
[8] 0.7777778 0.8888889 1.0000000
|
Here, to have a list of length 10 we had to go up by 1/9 each step; if we
wanted to go up by 0.1, then we made a fencepost error.
Another example of the named argument was in the random number generating
functions, which we'll take a closer look at later this lecture.
Now it's time to go into the matter of named arguments a little more closely.
We're going to use as an example our needle reuse function from before:
> needle <- function(answers,curtime,endtime,pp,qq) {
+ |
if (curtime >= endtime) { |
+ |
answers |
+ |
} else { |
+ |
xx <- answers[curtime] |
+ |
newx <- (1-qq)*(xx+(1-xx)*pp) |
+ |
needle(c(answers,newx),curtime+1,endtime,pp,qq) |
+ |
} |
+ }
|
Remember that the first argument was zero, which was the initial contamination
probability when the needle was new. The second argument is when we start
computing (on the first use). The third argument is how many data points
(needle reuses) we want to look at, the fourth argument pp is the
contamination probability when a needle is used (perhaps the prevalence of
infection among the population the needle is being reused on), and the
fifth argument is the chance the needle gets decontaminated before reuse.
We assumed independent reuses, so in the long run the chance the needle is
contaminated is the result of the pressure (so to speak) of the opposing
forces of contamination pp and decontamination qq. We assumedthe needle would last forever, and the result of the mathematics we talked
about when developing this very simple model is an example of a Markov chain.
Before going further, did it irritate you to have to remember what pp
and qq were? Which is contamination, and which is decontamination?
The variable names aren't very meaningful, are they? So stylistically, we
should have chosen better names. We should also document the program
with suitable comments. Finally, just to make a point, I changed curtime to time.start, and endtime to time.end:
# Needle reuse simulation, simple Markov chain, R 1.6.1
# answers should begin as a vector with the initial contamination probability
# time.start is the first use (usually 1)
# time.end is the number of uses to simulate
# contam.prob is the chance the needle will become contaminated each use
# decon.prob is the chance the needle will be decontaminated before each use
needle <- function(answers,time.start,time.end,contam.prob,decon.prob) {
if (time.start >= time.end) {
answers
} else {
xx <- answers[time.start]
newx <- (1-decon.prob)*(xx+(1-xx)*contam.prob)
needle(c(answers,newx),time.start+1,time.end,contam.prob,decon.prob)
}
}
|
While we're at it, we might as well start keeping our programs in files and
loading them in. This will be demonstrated in lecture for Windows. We
normally use the function source when working from the command
line; the GUI versions of R have drop-down items for this. Assume we have
sourced in the file above using whatever its path name is on
your system:
> source("needle.r")
> needle(0,1,10,0.2,0.1)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294 |
So this works fine.
Now, let's try it with a named argument:
> source("needle.r")
> needle(answers=0,1,10,0.2,0.1)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294 |
This worked fine also. To use named arguments, we enter the name of the
argument as it appeared in the function definition, the single equal symbol
=, and then the value you want bound to the argument when the function body
is evaluated. In the example we just did, we bound the value of answers
to the value 0 when the function was evaluated.
Let's do more of these calls:
> source("needle.r")
> needle(answers=0,time.start=1,time.end=10,contam.prob=0.2,decon.prob=0.1)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294
> needle(0,time.start=1,time.end=10,contam.prob=0.2,decon.prob=0.1)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294
|
So you may name any or all of the arguments. Not only that, but with
named arguments, you can place the arguments in any order, and the right
value will be associated with the right argument:
> #continuing...
> needle(time.start=1,answers=0,decon.prob=0.1,time.end=10,contam.prob=0.2)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294
|
So this is very nice: it makes the function calls easier to read, and it
keeps you from mixing the arguments up and matching the wrong value with the
wrong position. You should take advantage of this feature whenever possible.
It is an error to list a name you don't match:
> #continuing...
> needle(curtime=1,answers=0,decon.prob=0.1,time.end=10,contam.prob=0.2)
Error in needle(curtime = 1, answers = 0, decon.prob = 0.1, time.end = 10, contam.prob = 0.2) :
unused argument(s) (curtime ...)
|
Here, we don't have a curtime argument any more.
It's also an error to list a name twice:
> #continuing...
> needle(time.start=1,answers=0,time.start=1,decon.prob=0.1,time.end=10,contam.prob=0.2)
Error in needle(time.start = 1, answers = 0, time.start = 1, decon.prob = 0.1, time.end = 10, contam.prob = 0.2) :
formal argument "time.start" matched by multiple actual arguments
|
Notice that it didn't matter that we used the same value (1) each time
for time.start. You just can't use a name twice.
Another interesting feature of named argument matching is that it can
handle partial matches. Some people find this useful:
> #continuing...
> needle(time.start=1,ans=0,decon.prob=0.1,time.end=10,contam.prob=0.2)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294
|
Here, we used ans instead of answer. But because ans matches
the first three letters of answer, R knows what you mean and the
value of 0 is matched to the argument answer. It is an error to
have an argument partially match two different things:
> #continuing...
> needle(time.star=1,ans=0,decon.prob=0.1,time=10,contam.prob=0.2)
Error in needle(time.star = 1, ans = 0, decon.prob = 0.1, time = 10, contam.prob = 0.2) :
formal argument "time.start" matched by multiple actual arguments
|
Here, time.star and time were both partial matches for
time.start. But had we used a complete match for time.start,
this would have worked:
> #continuing...
> needle(time.start=1,ans=0,decon.prob=0.1,time=10,contam.prob=0.2)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294
|
What happens is that all the full matches are handled first, and then all the
partial matches. In the last example, time.start had already been
matched by the time the partial matches for time were processed.
And only then are unnamed arguments matched to the remaining arguments by
position.
> #continuing...
> needle(time.start=1,ans=0,0.1,time=10,contam.prob=0.2)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294
|
Here, it worked because first, time.start and contam.prob
were exact matches, and were processed. Then the partial matches ans
and time bound their values to answer and time.end
respectively. This left only the unnamed argument 0.1 to be bound to the
remaining argument, decon.prob. Now if we do this:
> #continuing...
> needle(time.start=1,ans=0,0.1,time=10,0.2)
[1] 0.0000000 0.0800000 0.1376000 0.1790720 0.2089318 0.2304309 0.2459103
[8] 0.2570554 0.2650799 0.2708575
|
we get a different answer, because the first unnamed argument (0.1) must
match the first remaining unmatched argument, which is contam.prob,
and the second unnamed argument (0.2) must match the second remaining unmatched
argument, which is decon.prob. So 0.1 was the contamination probability
and 0.2 was the decontamination probability.
There are remaining subtleties in function calls when the entity ...
is used to match multiple arguments to create a function that accepts a
variable number of arguments. We'll take a look at this later this course
or in the next course.
Default argument values
When you define a function, you may specify default argument values. This
way, if you leave the argument unspecified, the system will give an automatic
value. Positional evaluation and default values can lead to puzzling behavior if
not used carefully. Let's redefine the needle reuse example to give an
automatic default to the answer variable, and to time.start:
# Needle reuse simulation, simple Markov chain, R 1.6.1
# answers should begin as a vector with the initial contamination probability
# default value 0 (needle can't be contaminated if it is new)
# time.start is the first use (usually 1)
# default value 1 (start at the first use)
# time.end is the number of uses to simulate
# contam.prob is the chance the needle will become contaminated each use
# decon.prob is the chance the needle will be decontaminated before each use
needle <- function(answers=0,time.start=1,time.end,contam.prob,decon.prob) {
if (time.start >= time.end) {
answers
} else {
xx <- answers[time.start]
newx <- (1-decon.prob)*(xx+(1-xx)*contam.prob)
needle(c(answers,newx),time.start+1,time.end,contam.prob,decon.prob)
}
}
|
Now we can call it like this:
> #continuing...
> needle(contam.prob=0.2,decon.prob=0.1,time.end=10)
[1] 0.0000000 0.1800000 0.3096000 0.4029120 0.4700966 0.5184696 0.5532981
[8] 0.5783746 0.5964297 0.6094294
|
My advice is to try to avoid mixing default values and positional calls; if
you use default values, try to call the function with named arguments.
Surprisingly, default arguments are evaluated inside the function ("in the
evaluation frame of the function"). So, for example:
> example.fn <- function(aa=2,bb=aa) {
+ cat("aa=",aa,"and bb=",bb,"\n")
+ aa
+ }
> aa <- 3
> example.fn(4)
aa= 4 and bb= 4
[1] 4
> example.fn()
aa= 2 and bb= 2
[1] 2
> example.fn(aa=aa)
aa= 3 and bb= 3
[1] 3
|
What is important to note here is that the argument bb defaults to
whatever the value of aa is after the function is called. Whatever
value aa gets for the function call, bb defaults to that
value, and not to the value of aa when the function is
called. So even though aa is bound to 3 in the calling environment,
when the function is called the first time, aa is bound to 4 and
so bb is also bound to 4.
When defining functions with default values, don't try to refer to symbols
with the same name:
example.fn <- function(aa=aa,bb=999) {
+ cat("aa=",aa,"and bb=",bb,"\n")
+ aa
+ }
> aa <- 3
> example.fn(4)
aa= 4 and bb= 999
[1] 4
> example.fn()
Error in cat("aa=", aa, "and bb=", bb, "\n") :
recursive default argument reference
|
Remember, default function values are evaluated in the function (in
the evaluation frame of the function). And in this example, aa is
the thing you're defining; you're assigning its value inside the function in
terms of its value inside the function. It doesn't make sense, and the
system reports an error. Notice that the error does not happen at the time
you define the function, but at the time you attempt to call it using the
default.
Contrast the previous example of using the = symbol to define
a default value in the function definition with the following use of
= in a function call:
> example.fn <- function(aa,bb) {
+ cat("aa=",aa,"; bb=",bb,"\n")
+ aa
+ }
> aa <- 4
> example.fn(aa=aa,bb=5)
aa= 4 ; bb= 5
[1] 4
|
This is fine since you are not defining a value, but calling the function.
It is also important to realize that the default value is actually
created the first time the symbol is used. This is a useful feature, but it
can surprise you if you're not ready for it:
> example.fn <- function(aa={cat("used a's default!\n"); 99},bb=2) {
+ cat("aa=",aa,"; bb=",bb,"\n")
+ aa
+ }
> example.fn(4,5)
aa= 4 ; bb= 5
[1] 4
> example.fn(bb=8,aa=2)
aa= 2 ; bb= 8
[1] 2
> example.fn(aa=4)
aa= 4 ; bb= 2
[1] 4
> example.fn(bb=8)
used a's default!
aa= 99 ; bb= 8
[1] 99
|
What happened? The initial value for aa was the compound statement {cat("used a's default!\n"); 99}, which first prints the message
"used a's default!" and then evaluates 99. The value of
the compound statement is 99, so when the default value of aa
is needed, 99 will be the value. But before the 99 is used,
the message is printed.
Let's do another example. Remember: the default value is evaluated in the
frame of the function, and at the time the variable's value is first needed.
So watch this:
example.fn <- function(aa=ww) {
+ cat("Entering example function: ww=",ww,"\n");
+ ww <- 50;
+ cat("Now printing ww again:", ww,"\n");
+ cat("Now printing aa:",aa,"\n")
+ aa
+ }
> example.fn(4)
Entering example function: ww= 100
Now printing ww again: 50
Now printing aa: 4
[1] 4
> example.fn()
Entering example function: ww= 100
Now printing ww again: 50
Now printing aa: 50
[1] 50
> ww
[1] 100
|
What happened? The first time, aa got the value 4 because 4 was the
argument of the function that matched aa. What happened to ww
did not matter to aa; we first printed ww, and got the value
100 from the outside of the function. After we assign a local ww
for inside the function the value of 50, we get 50 when we print the value of
ww inside the function. After we are finished calling
the function, when we evaluate ww outside the function we
get the value 100 again. The value of 50 is lost once we end the function call.
But through all this, aa just keeps the value 4 from the function call.
The variable aa gets evaluated for the first time on the fourth line of the function body, and its value is 4.
But now look at the next call in the above example. Here, we did not give
an argument to the function, so the default must be used. And the default
value for aa is ww. Note that the first thing we do is
to evaluate ww, and as before we first get the global value of 100.
Then we assign 50 to our own local ww, and print that. Finally, we evaluate aa aa for the first time on
the fourth line of the function body. (Of course, we mean for the first time
during this call; what happened last time we called the function is gone and
doesn't count now.) And it is right then, at the first evaluation of
aa, that aa is assigned the value of ww, evaluated
inside the frame of the function. At that time, the result of evaluating
ww inside the function is 50, and so aa gets the value 50.
Of course, once we exit the function, ww is 100 again, because the
local value of 50 is lost after exiting the function.
Exercise. Without running this code, what result is printed where
the question marks appear? Why?
> example.fn <- function(aa=2*bb,bb=3*cc,cc=4*xyz) {
+ xyz <- 1
+ aa
+ }
> example.fn(4)
[1] ???
> example.fn()
[1] ???
|
At this point we have covered the essential features of the language, but we
will discuss more in the next course.
Other useful functions
We will also discuss the ifelse functions, as well as the sorting
and ranking functions. We will take a further look at random number
generation.