tests the we can "project" particular files out of a TreeArtifact for consumption in downstream rules
the intent is that by doing this projection:
- downstream rules that operate on files (not directories — i.e. not
TreeArtifactaware) can consume our artifact - sensitivity in downstream targets is narrowed to only the files in the
TreeArtifactthat are projected out
Note
this example also tests out interactions w/path mapping
rules_directory/skylib's directory rules:
- these rules do not use
TreeArtifacts; instead they are given the files representing a directory upon which they present an analysis-time directory like interface (i.e. subdirectory, glob)
directory_pathdoes ingestTreeArtifacts but producesDirectoryPathInfo— a tuple of aTreeArtifactand a relative path within it- consumers (i.e. downstream rules) must be "aware" of
DirectoryPathInfoin order to handledirectory_pathinputs correctly - to expose a
DirectoryPathInfoas a file,bazel-libhascopy_file- this is more or less what the rules in this example do with one caveat:
allow_symlnkis not allowed withDirectoryPathInfo(ctx.files.srcis empty) - this means that you have to fall back to creating an actual copy...
- this is more or less what the rules in this example do with one caveat:
this scheme (narrowing down a TreeArtifact to specific files cheaply via symlinks), coupled with ECO, allows you to get incrementality in your graph even in the face of monolithic actions that you cannot break up further
as an example:
- say that you have, some monolithic code generator that spits out a bunch of source and header files into a directory
- say that splitting up this code generator is intractable
- also, let's say that this code generator produces an intermediate number of header files — determinisitic but depends on the inputs in a way that's hard to model in analysis (not just a function of the names/number of input files, etc.)
- say that we do know the names of the source files
- we would like to model compilation of the source files such that we get incrementality — if only one of the source files (and none of the header files) changes, we'd like to only recompile that file
- note that the code generator's action is still being rerun
- narrowing what the downstream compile actions see as input allows ECO to keep us from having to rerun compile actions corresponding to source files that did not change
this use case is (imo) not that unrealistic...
re: do we even need to do this narrowing manually?
- for starlark rules: definitely; we do not (yet? 🤞) have access to buck2 style dynamic actions
- for "native" Bazel rules: still yes; afaik even the builtin rules do not support
TreeArtifactsources (java, for example)
If you know the contents of the
TreeArtifacta-priori such that you're able to project them like this, why model the action w/TreeArtifactat all?
couple of reasons:
- sometimes you know some of the eventual outputs but not all of them... having the
TreeArtifactis useful for capturing the "other" outputs - for this scheme you actually don't exactly have to know the outputs of interest all the way up in analysis!
- to expose things in the
TreeArtifactas regular files you certainly need to know the number of files of interest but exact file names can be something you figure out dynamically DirectoryExpander+map_each+allow_closureare a powerful escape hatch; they let you do fairly non-trivial things in that "after analysis but right before my action is executed" grey area
- to expose things in the
Also see (previously): "Bazel dynamic input subsetting with TreeArtifacts"
This gist is essentially using the same idea as ^ (narrow a TreeArtifact using symlinks for better incrementality) but with a couple of important differences:
- in ^, we're going from a known set of files to a
TreeArtifact(subset that's not known until execution time) - in this gist, we're going from a
TreeArtifactto a subset of files (that's ~known1 during analysis) - in ^ we were doing the
TreeArtifactbusiness within the confines of one rule - in this gist we're explicitly trying to expose the symlinks to other rules such that they can be used without having to be aware of our directory/directory narrowing scheme (i.e. unlike
bazel-skylib'sDirectoryInfo)
Caution
the materialized symlink in bazel-out is different for local execution vs RBE... (uh-oh?)
for things not in the bazel execroot it's the same when materialized...
but for things that are in the execroot, running locally materializes as a relative symlink, running remotely materializes as an abs path symlink to the file
EDIT: nevermind! this is just BWoB at play; with download_toplevel heuristics are used to uncover that the artifact is a symlink to another artifact (and that it thus needs to materialize the underlying artifact); this alternate materialization codepath is responsible for the discrepancy I think
this should probably still be fixed though... haven't verified for sure but I think this discrepancy might be able to influence downstream actions?
- yeah, the contents in the execroot actually are different...
Footnotes
-
DirectoryExpanderescape hatch applies ↩
Also see (previously): "Bazel dynamic input subsetting with
TreeArtifacts"This gist is essentially using the same idea as ^ (narrow a
TreeArtifactusing symlinks for better incrementality) but with a couple of important differences:TreeArtifact(subset that's not known until execution time)TreeArtifactto a subset of files (that's ~known1 during analysis)TreeArtifactbusiness within the confines of one rulebazel-skylib'sDirectoryInfo)Footnotes
DirectoryExpanderescape hatch applies ↩