Summary of meeting between Tidyverse members and Luke Tierney at useR! 2024.
Frontends and low level tools need to know what kind of bindings they are dealing with. Objectives include:
-
Avoiding side effects such as triggering a promise or causing a missing argument error. Low level tools often can't afford to protect against those for every variable lookup. Figuring out what happened by inspecting errors is also ambiguous, and sometimes impossible (promises may cause longjumps in a variety of ways).
-
Transparency in debugging/development settings. Providing context to the user about what's going to happen if they attempt to retrieve the value of a binding (i.e. an active binding invokation, a promise forcing leading to the evaluation of such and such expression, etc).
-
Completeness of the API to inspect and manipulate bindings. It should be possible to write an environment cloner using these tools: Iterate over bindings, retrieve type, given type, retrieve components (prexpr, prenv, active binding function, etc), given components, create duplicate binding in new environment.
-
tidyeval (the NSE framework for the tidyverse) needs to obtain both the expression and the original frame environment of substituted dots.
Existing API:
Rboolean R_existsVarInFrame(SEXP env, SEXP sym); // Unfortunate inconsistency in param order
Rboolean R_BindingIsActive(SEXP sym, SEXP env);
New API:
typedef enum {
R_BindingTypeUnbound = 0, /* Unbound in this environment */
R_BindingTypeValue = 1, /* Direct value binding */
R_BindingTypeMissing = 2, /* Missing argument */
R_BindingTypeDelayedPromise = 3, /* Delayed promise */
R_BindingTypeForcedPromise = 4, /* Forced promise */
R_BindingTypeActive = 5, /* Active binding */
} R_BindingType;
R_BindingType R_GetBindingType(SEXP sym, SEXP env);
Existing:
SEXP R_ActiveBindingFunction(SEXP sym, SEXP env);
New:
SEXP R_DelayedPromiseBindingExpression(SEXP sym, SEXP env);
SEXP R_DelayedPromiseBindingEnvironment(SEXP sym, SEXP env);
SEXP R_ForcedPromiseBindingExpression(SEXP sym, SEXP env);
Existing:
void R_MakeActiveBinding(SEXP sym, SEXP fun, SEXP env);
void Rf_setVar(SEXP sym, SEXP value, SEXP env); // Value
void R_removeVarFromFrame(SEXP sym, SEXP env); // Unbound
New:
void R_MakeDelayedPromiseBinding(SEXP sym, SEXP promiseExpr, SEXP promiseEnv, SEXP env);
void R_MakeForcedPromiseBinding(SEXP sym, SEXP promiseExpr, SEXP env);
void R_MakeMissingBinding(SEXP sym, SEXP env);
We need a way to create forced promises that work with substitute()
. This could be achieved by passing a NULL
environment or by splitting the constructor into two variants.
If we use a NULL
environment as an indicator for forced promises, we can simplify the API by sharing the type, accessors, and constructor:
typedef enum {
R_BindingTypeUnbound = 0, /* Unbound in this environment */
R_BindingTypeValue = 1, /* Direct value binding */
R_BindingTypeMissing = 2, /* Missing argument */
R_BindingTypePromise = 3, /* Delayed or forced promise */
R_BindingTypeActive = 4, /* Active binding */
} R_BindingType;
SEXP R_PromiseBindingExpression(SEXP sym, SEXP env);
SEXP R_PromiseBindingEnvironment(SEXP sym, SEXP env);
void R_MakePromiseBinding(SEXP sym, SEXP promiseExpr, SEXP promiseEnv, SEXP env);
Useful to do at C level for two things:
-
Fast dots checkers, i.e. https://rlang.r-lib.org/reference/check_dots_unnamed.html and https://rlang.r-lib.org/reference/check_dots_used.html
-
Capturing environments of unforced arguments passed through multiple levels of dots. Necessary for hygienic evaluations of captured expressions.
typedef enum {
R_DotsBindingTypeValue = 0, /* Direct value binding */
R_DotsBindingTypePromise = 1, /* Delayed or forced promise */
} R_DotsBindingType;
typedef struct {
R_DotsBindingType type;
SEXP name;
} R_DotsIteratorItem;
/* Returns a private LISTSXP containing: the iterator state as a RAWSXP in the
CAR, a protecting container in the CDR (for extra safety we might want to
protect the current binding), and a type identifier in the TAG (for runtime
error checking). The caller must protect this object and consider it opaque.
The behaviour in case `env` does not contain a DOTSEXP could be an error
(check the binding type for `...` beforehand) or an empty iterator. */
SEXP R_MakeDotsIterator(SEXP env);
/* Returns true if advanced, in which case `item` is safely readable. */
Rboolean R_DotsNext(SEXP dotsIterator, R_DotsIteratorItem *item);
SEXP R_DotsPromiseBindingExpression(SEXP dotsIterator);
SEXP R_DotsPromiseBindingEnvironment(SEXP dotsIterator);
SEXP R_DotsValueBinding(SEXP dotsIterator);
SEXP iter = R_MakeDotsIterator(env);
R_DotsIteratorItem item;
while (R_DotsNext(iter, &item)) {
switch (item.type) {
case R_DotsBindingTypeValue: Rf_PrintValue(R_DotsValueBinding(iter)); break;
case R_DotsBindingTypePromise: Rf_PrintValue(R_DotsPromiseBindingExpression(iter)); break;
}
}
Currently our main concern is avoid materialising row names. In the future, getAttrib()
should return an altrep string sequence for automatic row names. In the meantime, if an object already has altrep row names, it should not materialise it, which is currently the case via INTEGER()
.
It might be useful to have a way of getting and setting a list of attributes, but we'll first try to manage without that.