In the following we will be using R’s “restarts” feature to implement the state machine that drives generators in languages such as Python. Generators allow lazily generating values on demand: a consumer invokes a generator, and consumes values as they are produced. A new value is only produced once the previous one has been consumed.
Decoupling consumer and producer in this fashion is a powerful technique, because a consumer often has a better idea of how many values are needed than the generator. This means that a consumer can interrupt the generation of more values.
One consequence of this is that generators can be infinite; for instance, here is a potential implementation of a Fibonacci generator in Python:
def fib():
a, b = 0, 1
while True:
yield b
a, b = b, a + bNow if we want to print the first ten Fibonacci numbers we can use the generator as follows:
i = 0
for n in fib():
i += 1
if i > 10: break
print nNeat! So what’s stopping us from doing this in R? Well, a first answer is:
the lack of yield, which yields execution back to the caller but then
resumes the function execution until the function either reaches its end,
or the caller aborts it.
A second answer is: nothing!
In fact, we can implement yield in less than ten lines of code if we’re
willing to cheat (heck, are we!).
yield = function (item) {
do_yield = function (item, expr, index_name, parent) {
item = setNames(list(item), index_name)
expr = do.call(substitute, list(expr, item))
eval(expr, parent)
}
yield_condition = structure(list(value = item), class = c('yield', 'condition'))
withRestarts(stop(yield_condition), do_yield = do_yield)
}?!?
This function essentially needs to be read bottom-to-top: it stops
execution with a specific condition (now’s a good time to read up on
condition handling). However, this happens within a withRestarts.
Don’t bother reading its documentation; it’s
incomprehensible. In a nutshell, this function allows us to specify
“restarts” — that is, get-out-of-jail-free cards that are used higher up by
the user to resume execution after a condition was signalled. Such a
restart is defined here as do_yield (for lack of a better name …).
What does this restart-function do, and where is it called from? Glad you
asked …. But first we need to define a bit of syntactic sugar. The end goal
is to use yield just as we would use it in Python; for example, in a
for loop. We’ll keep it simpler for now and define our own loop-like
construct:
foreach (i, generator(), message('Hello ', i))It’s this foreach which will have to ensure that execution is restarted
after the yield condition was signalled. Without further ado:
foreach = function (index, generator, expr) {
# Don’t evaluate anything (just yet):
index_name = as.character(substitute(index))
generator = substitute(generator)
expr = substitute(expr)
# And remember where we’re calling from:
parent = parent.frame()
# Here the magic happens.
yielder = function (e)
invokeRestart('do_yield', e$value, expr, index_name, parent)
withCallingHandlers(eval(generator, parent), yield = yielder)
}And let’s test it real quick:
generator = function () {
for (i in 1 : 5)
yield(i)
}
foreach (i, generator(), cat('Hello', i, '\n'))## Hello 1
## Hello 2
## Hello 3
## Hello 4
## Hello 5
It works!
So: what’s going on here? foreach essentially evaluates the generator
expression (which, in our example, is a call to the generator
function). But rather than evaluating it directly, it does so “with
calling handlers”:
“Here’s an expression that we’d like to execute,” we tell R. “And just in
case this raises a yield condition,” we continue, “please execute our
yielder function.” A lot of jumping around ensues: the generator
function gets called, which in turn calls yield. yield signals a
yield condition by calling stop. And foreach reacts to this yield
condition by dispatching to the yielder. The yielder now invokes the
restart that we had previously defined just before calling stop
(remember?). That restart, called, do_yield, gets passed a bunch of
parameters:
- the current value, which we have stored inside the condition;
- the expression (the “loop body”) to execute for each value;
- the name of the index variable in our loop; and finally,
- the caller’s environment, in which to execute the loop body.
We have now come full circle: do_yield takes the loop body expression and
substitutes the name of the index variable with its current value. In our
example, it takes the unevaluated expression cat('Hello', i, '\n') and
transforms it into cat('Hello', 1, '\n') (obviously that’s for the first
iteration; subsequently the values 2, … are substituted). And finally, it
evaluates the so formed expression in the caller’s environment. The end.
There remains one thing to do: make this work with R’s own for loop.
for, like almost everything else in R, is simply a function. And we can
totally redefine it. Oh yes.
`for` = function (var, seq, expr) {
tryCatch(eval.parent(substitute(foreach(var, seq, expr),
list(var = substitute(var),
seq = substitute(seq),
expr = substitute(expr)))),
`break` = function (e) invisible())
invisible()
}
# Another small hack to make `break` work as expected.
makeActiveBinding('break',
function () stop(structure(list(), class = c('break', 'condition'))),
environment())for (i in generator())
cat('Hello for', i, '\n')## Hello for 1
## Hello for 2
## Hello for 3
## Hello for 4
## Hello for 5
(Granted, I cheated again: redefining for as above makes it unusable in
the conventional way, so I had to redefine the generator function to use
lapply instead of for. It’s still yielding though.)
And finally, here’s the fib example we started with — an infinite
generator of Fibonacci numbers:
fib = function () {
x = c(0, 1)
repeat {
yield(x[2])
x = c(x[2], sum(x))
}
}
i = 0
for (n in fib()) {
if ((i = i + 1) > 10) break
cat(n, '\n')
}## 1
## 1
## 2
## 3
## 5
## 8
## 13
## 21
## 34
## 55
I mentioned initially that our implementation is cheating. In fact, a Python generator can be invoked and stored in a variable for later consumption, like so:
x = fib()Alas, attempting this with our iterator won’t work:
x = fib()## Error in stop(yield_condition): bad error message
As we haven’t installed a handler for yield here, the code breaks. Fixing
this is beyond the scope of this article (it requires a different
approach).
To understand more about how restarts in R work, head over to the Advanced R chapter Beyond exception handling: conditions and restarts.
And if you find the code above entertaining, here are some more projects:
- Look at the source code of
base::callCCand work through that. It’s short — less than ouryield— put packs a punch. - List comprehension syntax like in Haskell
- A concise lambda syntax:
x -> 2 * xdefines a function equivalent tofunction (x) 2 * x. - Function decorators like in Python (still work in progress).

This is an awesome gist.