Skip to content

Instantly share code, notes, and snippets.

@lionel-
Last active September 9, 2025 08:48
Show Gist options
  • Save lionel-/1ebcbd5ec69c0775d514c329522408a3 to your computer and use it in GitHub Desktop.
Save lionel-/1ebcbd5ec69c0775d514c329522408a3 to your computer and use it in GitHub Desktop.
Compatibility of tidyverse with the public C API of R

Summary of meeting between Tidyverse members and Luke Tierney at useR! 2024.

Frame/Environment inspection

Frontends and low level tools need to know what kind of bindings they are dealing with. Objectives include:

  • Avoiding side effects such as triggering a promise or causing a missing argument error. Low level tools often can't afford to protect against those for every variable lookup. Figuring out what happened by inspecting errors is also ambiguous, and sometimes impossible (promises may cause longjumps in a variety of ways).

  • Transparency in debugging/development settings. Providing context to the user about what's going to happen if they attempt to retrieve the value of a binding (i.e. an active binding invokation, a promise forcing leading to the evaluation of such and such expression, etc).

  • Completeness of the API to inspect and manipulate bindings. It should be possible to write an environment cloner using these tools: Iterate over bindings, retrieve type, given type, retrieve components (prexpr, prenv, active binding function, etc), given components, create duplicate binding in new environment.

  • tidyeval (the NSE framework for the tidyverse) needs to obtain both the expression and the original frame environment of substituted dots.

API considerations

Binding type

Existing API:

Rboolean R_existsVarInFrame(SEXP env, SEXP sym);  // Unfortunate inconsistency in param order
Rboolean R_BindingIsActive(SEXP sym, SEXP env);

New API:

typedef enum {
    R_BindingTypeUnbound = 0,          /* Unbound in this environment */
    R_BindingTypeValue = 1,            /* Direct value binding */
    R_BindingTypeMissing = 2,          /* Missing argument */
    R_BindingTypeDelayed = 3,          /* Delayed promise */
    R_BindingTypeForced = 4,           /* Forced promise */
    R_BindingTypeActive = 5,           /* Active binding */
} R_BindingType;

R_BindingType R_GetBindingType(SEXP sym, SEXP env);

Binding components

Existing:

SEXP R_ActiveBindingFunction(SEXP sym, SEXP env);

New:

SEXP R_DelayedBindingExpression(SEXP sym, SEXP env);
SEXP R_DelayedBindingEnvironment(SEXP sym, SEXP env);

SEXP R_ForcedBindingExpression(SEXP sym, SEXP env);

Binding creation

Existing:

void R_MakeActiveBinding(SEXP sym, SEXP fun, SEXP env);
void Rf_setVar(SEXP sym, SEXP value, SEXP env); // Value
void R_removeVarFromFrame(SEXP sym, SEXP env);  // Unbound

New:

void R_MakeDelayedBinding(SEXP sym, SEXP expr, SEXP evalEnv, SEXP env);
void R_MakeForcedBinding(SEXP sym, SEXP expr, SEXP value, SEXP env);
void R_MakeMissingBinding(SEXP sym, SEXP env);

We need a way to create forced promises that work with substitute(). This could be achieved by passing a NULL environment or by splitting the constructor into two variants.

Simpler promise API

Edit: We've decided against this and went for the explicit predicates and accessors.

If we use a NULL environment as an indicator for forced promises, we can simplify the API by sharing the type, accessors, and constructor:

typedef enum {
    R_BindingTypeUnbound = 0,    /* Unbound in this environment */
    R_BindingTypeValue = 1,      /* Direct value binding */
    R_BindingTypeMissing = 2,    /* Missing argument */
    R_BindingTypePromise = 3,    /* Delayed or forced promise */
    R_BindingTypeActive = 4,     /* Active binding */
} R_BindingType;

SEXP R_PromiseBindingExpression(SEXP sym, SEXP env);
SEXP R_PromiseBindingEnvironment(SEXP sym, SEXP env);

void R_MakePromiseBinding(SEXP sym, SEXP promiseExpr, SEXP promiseEnv, SEXP env);

Iterating over dots

Edit: We've discussed another API for this with Luke at the R dev day, see Davis' comment below.

Useful to do at C level for two things:

typedef enum {
    R_DotsBindingTypeValue = 0,      /* Direct value binding */
    R_DotsBindingTypePromise = 1,    /* Delayed or forced promise */
} R_DotsBindingType;

typedef struct {
    R_DotsBindingType type;    
    SEXP name;
} R_DotsIteratorItem;

/* Returns a private LISTSXP containing: the iterator state as a RAWSXP in the
   CAR, a protecting container in the CDR (for extra safety we might want to
   protect the current binding), and a type identifier in the TAG (for runtime
   error checking). The caller must protect this object and consider it opaque. 
   
   The behaviour in case `env` does not contain a DOTSEXP could be an error
   (check the binding type for `...` beforehand) or an empty iterator. */
SEXP R_MakeDotsIterator(SEXP env);

/* Returns true if advanced, in which case `item` is safely readable. */
Rboolean R_DotsNext(SEXP dotsIterator, R_DotsIteratorItem *item);

SEXP R_DotsPromiseBindingExpression(SEXP dotsIterator);
SEXP R_DotsPromiseBindingEnvironment(SEXP dotsIterator);
SEXP R_DotsValueBinding(SEXP dotsIterator);

SEXP iter = R_MakeDotsIterator(env);
R_DotsIteratorItem item;

while (R_DotsNext(iter, &item)) {
    switch (item.type) {
        case R_DotsBindingTypeValue: Rf_PrintValue(R_DotsValueBinding(iter)); break;
        case R_DotsBindingTypePromise: Rf_PrintValue(R_DotsPromiseBindingExpression(iter)); break;
    }
}

Attributes

Currently our main concern is avoid materialising row names. In the future, getAttrib() should return an altrep string sequence for automatic row names. In the meantime, if an object already has altrep row names, it should not materialise it, which is currently the case via INTEGER().

It might be useful to have a way of getting and setting a list of attributes, but we'll first try to manage without that.

@DavisVaughan
Copy link

DavisVaughan commented Aug 11, 2025

Discussed the Iterating over dots section at r-dev-day on Aug 11, 2025 at useR! in Durham, NC with Luke.

We settled on a simpler scheme that:

  • Doesn't require iterators, which currently don't exist in the R C API
  • Don't expose DOTSXP in any way
  • Provide C level access to ...names() and ...length()
  • Nicely mirror the environment helpers created above
// ------
// Dots helpers

// Check if dots exist
Rboolean R_DotsExist(SEXP env);

// ...length(), error if dots don't exist
R_xlen_t R_DotsLength(SEXP env);

// ...names(), error if dots don't exist
SEXP R_DotsNames(SEXP env); 

// ...elt(), Forces promises, errors on `R_DotTypeMissing`
SEXP R_DotsElt(R_xlen_t i, SEXP env);

// ------
// Dot helpers

// For all helpers:
// - If dots don't exist, an error should be thrown, because you should use `R_DotsExist()` first
// - OOB indexing should throw an error, because you should use `R_DotsLength()` first

typedef enum {
    R_DotTypeValue = 0,     
    R_DotTypeMissing = 1, 
    R_DotTypeDelayed = 2,    
    R_DotTypeForced = 3    
} R_DotType;

R_DotType R_GetDotType(R_xlen_t i, SEXP env);

// For `R_DotTypeDelayed`
SEXP R_DotDelayedExpression(R_xlen_t i, SEXP env);
SEXP R_DotDelayedEnvironment(R_xlen_t i, SEXP env);

// For `R_DotTypeForced`
SEXP R_DotForcedExpression(R_xlen_t i, SEXP env);

To implement these we should tap into and refactor the existing dots tooling here:
https://github.com/wch/r-source/blob/503d9e0e8af0b394fb483fa604310ed077ff73b9/src/main/envir.c#L1426-L1507

@lionel-
Copy link
Author

lionel- commented Sep 9, 2025

To test the new dots accessors, change the capturedots and capturedot utils in https://github.com/r-lib/rlang/blob/main/src/capture.c to use them and run rlang checks. Tidyverse packages like dplyr and tidyr should still pass checks with these changes as well (but rlang tests should already be comprehensive enough to be confident).

Once confirmed that the dots utils work, it would also be worth updating rlang_capturearginfo() in that same file to use the new environment utils (promise accessors and constructors) from the first patch, and check rlang tests. This way all the low level pieces of tidyeval would be using the new public API of R for promises and dots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment