Skip to content

Instantly share code, notes, and snippets.

@rocketnia
Last active October 28, 2018 07:30
Show Gist options
  • Save rocketnia/7fddafee7433a8a0c63f732babc6c489 to your computer and use it in GitHub Desktop.
Save rocketnia/7fddafee7433a8a0c63f732babc6c489 to your computer and use it in GitHub Desktop.
Cene namespace notes

Cene namespace notes

Stream of consciousness 1: Several ideas

See also the Gist "Cene packages": https://gist.github.com/rocketnia/cac8aa719470f3b6fa4b49e1131270f9

Given a "contribute function" and "contribute argument" based open-world-assumption extension framework, and given the ability to define the same thing more than once if specifying a dex at the same time, it's possible to define transitive closure relations where the entries are contributed as arguments: Contribute a function, have it contribute another function each time it gets an argument, and have that function contribute an argument if the two arguments imply something.

(Rather than just specifying a dex with each definition, we could specify something less general than a merge but more general than a (merge-by-dex ...). Not only would it be idempotent like a merge, but it would also ensure that the result was "backwards compatible" with both inputs -- behaving exactly alike in all non-error cases, and exhibiting no more errors than either one of them does. Let's call these "conflates." We'll end up splitting merge-by-dex into merge-by-conflate and conflate-by-dex.)

With that transitive closure technique, we can impose orders after the fact on arbitrary tokens. We can establish that the computations that can write to a set of contributions and the computations that can read the whole set are temporally separated from each other, without implying an ordering between that set of contributions and another set of contributions. An attempt to write a contribution from a certain computation will also attempt to write an ordering relation entry saying the computation's identity temporally precedes the collection's cutoff point identity. An attempt to read a full set of contributions (which is really an attempt to write a pending computation which will receive that set of contributions) will also attempt to write an ordering relation entry saying the collection's cutoff point precedes the pending computation.

The computation ordering relation itself never needs to be read in full by a user-supplied computation -- but it may nevertheless be read in full by the language internals, not only because they can, but because error summarizing and compilation output can conceptually occur after the user-supplied part of the compilation process.

Certain things never need to be written to by a user-supplied computation, but in that case, usually the computation simply won't have access to the names it would need in order to perform the write. Cene code which has been compiled to a target platform may be able to call functions, but it will not have access to the same qualify function the Cene macroexpander was passing around, so it will not be able to write to the correct places.

Speaking of qualify functions, the macroexpander's original qualify function may itself look up values written to the global namespace as part of its computation. This allows us to represent Cene programs where some of the names in the program are only qualifiable after (import ...) declarations are processed. This technique will come in handy not only for module bodies but also for local definitions.

In modules, using qualify and the global namespace will typically result in accessing definitions that are not visible to any other module, not visible even to the compiled code. There are a few especially notable cases:

  • A module can obtain widely-visible references to the exports of other modules, looked up by public key or by a UUID interpreted by the Cene implementation. (This is the same kind of UUID used as a "magic number" for a Cene package to declare what version of Cene it's written for. The exports of UUID-named modules will tend to be consistent in a way that can be used for package code instantiation.)

  • A module can obtain widely-visible references to the source code of other packages, looked up either by hash or by a "private key" authorization resulting from a signature on the importing code.

  • We should probably take special care to ensure that Cene code that has been compiled cannot use names to create struct tags that let it create structs that let it access function behavior which has not been carried over into the compiled code. We could just avoid letting the compiled code use effects-expr-eval (although that's not really an option if one of our compilation targets is for making Cene packages... unless we do a Cene-to-Cene FFI...), or we could somehow run the tags through a qualify pass that the compiled code can't replicate without the same qualify function. Perhaps there are other approaches.

  • We should not allow definitions to names that have not been made sufficiently obscure -- for instance, defining directly to the name of a string. If these definitions are allowed, they pose a hazard, since multiple modules (running on different computers, perhaps) will be able to make incompatible definitions there without ever causing an error, since the incompatible definitions never even exist in the same runtime.

Because of the last point there, we need the thing we pass to effects-get and effects-put to be something with more structure to it than just a name. Perhaps it could even just be a struct containing a name, so that the struct tag determines whether it's a module import, a module-local definition, etc. This almost makes effects-get like an effectful version of follow-heart, taking a parameter that's essentially a clamor.

There's a similar sticking point about how module bodies can do imports. If in order to define anything, you have to first define what qualify returns for that thing's source-level name, then do you have an endless regress of definitions to make before you can get started? What we can do is have qualify take a non-name as input; this way it can detect whether it's supposed to qualify a "name for a qualify result." This is not a reasonable thing to import from a module, so qualify can qualify these values in a non-customizable way, while still being as customizable as we need it to be for other input values.

Together, this means we're not passing a name to qualify, and the thing we're getting from qualify and passing to effects-get and effects-put isn't a name either. The very idea of a "global namespace" may be a misnomer. Let's call it a definition space instead. This definition space will have entire cases dedicated to barring things like names and strings from being used as definition targets, but it lets structs through... and if it lets some structs through, arguably it should be customizable to let user-defined structs through as well. Instead of defining key-value pairs, we're defining pattern matching branches.

We could take this to a very expressive extreme and say that in fact the entire global definition space is the inside of a function. We could set a goal to be able to implement first-class Cene functions by way of similar side effects that read and write on a definition space that represents the function's argument and return value. We could anticipate other kinds of incrementally defined value stored in the global definition space, like structs and tables defined one projection at a time.

But that vision has a hard-to-reconcile dissonance with contribution sets -- not only with the "contribute an argument" and "contribute a function" open-world-assumption interaction, but also with the "read the set of contributions in full" closed-world-assumption interaction. What would these contribution sets mean if we tried to understand them as incrementally defined first-class values?

Another thing we could think about doing is what Haskell typeclasses do: Require every two defined pattern matching branches to be non-overlapping. In our case, one branch that coincides with another is also okay; we "conflate" their values. One branch being "more specific" than another is probably also okay, since the incrementally-defined-value interpretation would say that the definition starts off being a single case that results in a very vague value. The point of keeping these cases from overlapping is so that they don't multiply, where code that appears to be adding one pattern match is actually adding several branches that represent intersections between that pattern and existing patterns. Hmm, maybe that's not important. What did this have to do with contribution sets again?

When we "read the set of contributions in full," and our contributions are actually to structs rather than names, we don't necessarily get a table out of it. What we get out of it is a function where we happen to have access to some information about what keys there are. We could use this information to get tables and such, essentially having operations that do the following:

  • Restrict: Restrict the function-table by a given pattern, stripping out all the cases that don't match it.

  • Convert to table: If all the remaining cases of the function-table are specific to certain values (e.g. specific names), return a table that takes the names of those values to pairs containing the value and the value it maps to.

  • Unwrap singleton: If there's only one remaining case in the function-table, return its value.

But... do we really need all this complexity? It seems like, for all our current use cases, we just need a "global namespace" where the "names" are actually unary structs which wrap the names so that the qualify/effects-get/effects-put operations know what certain names are for. Another way to approach this would be to have multiple functions taking the place of qualify/effects-get/effects-put, but keeping them together would be less unruly.

We may want a kind of function-table data structure just so that we can shadow names for qualify without leaving the shadowed ones accessible to the garbage collector, but at this point, that problem isn't a problem that actually exists.


Actionable items for stream of consciousness 1

  • A technique to have on hand: Use a technique where we make transitive closures using pubsub OWA extensibility. Use this when we want to do self-organizing soundness ordering, such as for CWA extensibility.

  • Pie in the sky: It's fascinating to think of Cene values that are only partially defined. If we do start to work them into the language, let's consider a set of operations like merges which we could call "conflates," which unify two compatible partial values into a more complete one.

  • Actionable: Instead of names, we should pass structs to qualify functions so that they can qualify imported names differently than they qualify module-local names. Instead of names, we should pass encapsulated values to effects-get and effects-put (and as the return value of qualify functions) so that they can enforce that all definitions are to places that only the currently compiling Cene code has access to. Each compilation of Cene code can conceptually have a different root qualify function, so two executions of Cene can't have their definitions collide if they only define qualified names.

  • Actionable: Let's make an Era-caliber "Meaning-Preserving Modularity" module system. Let's allow modules to use their root qualify function to look up:

    • Various dedicated ways to qualify struct tags, macro names, function definitions, etc., which we've basically already implemented.

    • Their own namespace they can populate with case-by-case information about how to qualify each source-level name.

    • Their own namespace which is qualified according to those definitions.

    • Their own module-local namespace.

    • Their own namespaces that copy the exports of modules they specify by public key or UUID.

    • Their own namespaces that copy the source code of packages they specify by hash.

    • Their own namespace that they can populate with their exports.

    • The above namespaces may or may not coexist with the way we were using the JavaScript version of a Cene program as a build tool with quine-based compilation support:

      • Their own namespace that contains an input file tree, command line options, and command line environment variables.

      • Their own namespace that they can populate with an output file tree.

  • A solution we don't need yet: To allow us to shadow qualify functions in a way which allows the originals to be partly garbage-collected, we might want to use a data structure that's similar to a table (key-value map) but with pattern-matching clauses instead of keys.


Stream of consciousness 2: Making eval-enabled dependency on function implementations impossible

Above, we say "We should probably take special care to ensure that Cene code that has been compiled cannot use names to create struct tags that let it create structs that let it access function behavior which has not been carried over into the compiled code."

Let's state the problem again.

Cene code used as a bulid tool can create compilation artifacts based on the same function definitions that were defined at build time. The Cene code just specifies one nullary struct tag to compile as the "main" function. The compiler crawls that struct tag's function definition and compiles it. In the process, it finds out what structs that code can construct, so it compiles those struct tags' function definitions as well, bundling them all together in the output.

A struct tag is derived from names, and we have certain ways Cene code can construct a struct if it has those names. In particular, Cene code can build expressions which would construct those structs, and we have offered an eval operation for those expressions. This eval operation is particularly useful for writing a def-macro operation, since the whole point of a macro body is that it's executed (not just compiled) during build time.

The problem is that if we're not careful, we may allow compiled code to construct a struct and call it as a function even if its function implementation hasn't been bundled into the compiled code. We need to be sure that the only build-time function implementations the compiled code can call are the ones which are accessed by struct construction expressions the compiler can discover and account for.

There are a few approaches we can take here:

  1. Find a way to avoid eval entirely. (Unlikely.)

  2. Never compile to a target platform that will be run on a Cene implementation with support for eval. (Unlikely. We do potentially want to write Cene build systems that build compiled Cene modules.)

  3. Make it impossible to create the same struct construction expressions at run time that we create at compile time.

    a. Perhaps the expression requires a "qualified" struct tag, such a value can only be created by the root qualify function, and it cannot be reified into compiled code. In this case, different qualified struct tags (from the different root qualify functions at compile time and at run time) must be for constructing different structs; they can't just be the same struct tag with different "yes, I authorize this" stickers attached to it. (This approach seems fully workable.)

    b. Perhaps the expression must be constructed using a side effect, and in the process it gets some extra information attached to it regarding the identity of the current computation. Again, this would affect what struct is constructed by the expression. (This seems about as useful as (3a) but less convenient. It would probably lead us to use an applicative functor DSL to avoid CPS-style code in our expression-constructing code.)

  4. Since we use a side effect for evaluation, we can conceptually have the same expression evaluate into different struct values depending on the identity of the current computation (so, returning a "different" value at compile time than it does at run time). We can actually implement this difference by having function implementations compile into slightly different function implementations. At run time, the function implementations that arrived from this compilation process make build-time structs, but the ones that are defined at run time make run-time structs. (This approach strikes me as weird, but it seems workable anyway.)

Hmm, either (3a) or (4) seems like the way to go, but which one?

Here's something I haven't thought about: Suppose we have a Cene build tool that compiles a Cene "service" -- like a module, but something that continues to listen on pubsub OWA/CWA contributions in order to interact with the code that imports it. Now suppose we modify the build tool so that it imports (an earlier build of) that Cene service, but it doesn't actually use it for anything. The result is a program that produces the same compiled Cene module it's currently using. If it does interact with the service, its effects would not be distinctively "earlier" or "later" than the service's effects as a whole (for "read the set of contributions in full" purposes). So, perhaps we don't actually know that compiled Cene code is supposed to be run in a computation that is later than the build-time computation, perhaps not even a computation with a different identity. After all, if one computation contributes a function and the other contributes an argument, the computation that spawns from that will have an identity that's not quite either of theirs alone.

Another thing to note is that with a service architecture like this, we would make Cene run the service in such a way that it has access to a different qualify function. This means the definitions it holds that were quined from the build-time definitions will not usually conflict with the originals (which are being defined at the same time in this scenario). The only reason they could be under the same name would be if the services gave their qualify results to each other (or their entire qualify functions), and that would be a special circumstance.

(I suppose they could also conflict if the build tool declared module exports and then used those exports in the compiled code, thus making the compiled code perform its own exports to the same module.... Hmm. No, that's not a problem, because module export definitions would not be a subset of function implementation definitions, so they would not be carried over into a compilation artifact which was based on a "main" function. They might be carried over to a compiled Cene module/package that was based on a set of things for it to export... and I suppose that's exactly how it would work, with no need for module export definitions to be their own thing in the namespace.)

(Come to think of it, module imports should be detectable by the compiler and compiled into the same module imports, so they should really be encoded inside of expressions for the compiler to discover. Even if we use qualify to help make imports easy to use, registering them in the definition space isn't really essential to how they work.)

It seems option (4) makes less and less sense, and option (3a) makes more and more sense. Option (4) requires us to have some notion of a unique identity for the currently running Cene instance, but even if we did, it would not necessarily be a qualitatively different identity than the identity of the computation which runs the compiled Cene code, since we could have a partially self-hosting build process on our hands. Option (3a) leads to almost the same implementation where the constructor expressions are compiled to code that's slightly different than what the target Cene code could build for itself, but it's justified by the idea that the compiled code is simply a different module and would therefore run under a different qualify function even in a self-hosting scenario.

Hmm... But if we're in the self-hosting scenario and the (3a) case... Yeah, all right, most of the build-time definitions that are compiled and bundled into the compiled Cene service will technically use struct tags that (when the Cene service is run alongside the build) are inconsistent with the ones used by the build-time definitions themselves. Passing structs back and forth between services using pubsub could yield surprising results. The reason this is all right is because it can be mitigated by making more fine-grained modules: If a struct constructor expression specifies a struct tag that's from an imported module, then it will compile to the exact same struct tag, containing the same import specification.

Given some of our objections to (4), accepting this about (3a) almost seems like cheating. Does this disappointment in (3a) make (4) look better in comparison? ...No, the most crucial objection to (4) is that there's no way to distinguish one service's effects from another's after they've used pubsub to spawn a computation based on a function from one and an argument from the other. That combined computation ought to be able to get the eval behavior of either of its parents, and (3a) automatically comes with a way to do that, since the two parents are distinguished by their access to two different first-class qualify functions.

So, (3a) it is.


Actionable items for streams of consciousness 1 and 2

We'll briefly mention a few items from "Actionable items for stream of consciousness 1" so we don't forget them, in a slightly different order this time, and then we'll get into items that mainly come from stream of consciousness 2.

  • (Repeat) A technique to have on hand: Our OWA extensibility can let us implement transitive closure relations, including the well-foundedness ordering we need to impose on computations for our CWA extensibility.

  • (Repeat) Pie in the sky: It's fascinating to think of Cene values that are only partially defined, which we may perhaps "conflate," but we don't need this yet.

  • (Repeat) A solution we don't need yet: We might want a "function-table" data structure to allow us to shadow qualify functions in a way which makes the old ones partially garbage-collectable.

  • (Repeat) Actionable: Our qualify, effects-get, and effects-put operations should take informative structs, not unadorned name values. (Note: Unlike what we state here in "Actionable items for stream of consciousness 1," it is different Cene modules or services that should have different qualify functions. There isn't necessarily a meaningful difference between qualify functions just from running Cene more than once, since some modules may be self-hosting, involved in their own build process.)

  • (Note: We are not repeating quite the same "Meaning-Preserving Modularity" item from "Actionable items for stream of consciousness 1," since imports and exports have been rethought substantially. Instead, we're describing the new understanding from scratch.)

  • Actionable: Let's make an Era-caliber "Meaning-Preserving Modularity" module system, with module API-abiding imports specified by public key or UUID and source code imports specified by proof-of-private-key (due to code signing), hash, or full package content. The place we specify imports in Cene code will be the struct tags. Struct tags (aside from having a main tag name and projection names) should encode information about what module they're imported from; that way when the compiler encounters a struct construction function, it knows that the function implementation can be imported and therefore doesn't need to be compiled itself. We should let Cene builds create quined Cene packages by specifying the set of struct tags whose implementations should be compiled and bundled into the package.

  • Actionable: Let's arrange the way we use qualify functions so that lexical regions (e.g. files) of Cene code can use imports and exports to avoid dumping everything they're doing into a single namespace. This is mostly about internal code organization and the ability to move definitions in and out of local scopes, not about making API boundaries for the public to use. The way we'll do this is by letting some qualify results depend on other definitions the lexical unit makes. That is, the usual thing to pass to qualify is a struct that means "please get me whatever I've said this name should qualify to," but more specific requests like "please get me something private to this region" and "please get me something public from file X" will still work. The "please get me whatever I've said this name should qualify to" call will block until a definition has been written to a place obtained by a corresponding "please get me a place where I can specify what this name should qualify to."

  • Potential design detour: In addition to using Cene as a build system, and in addition to having a Cene compile target for packages that can be used for module distribution, consider having another Cene compile target for "services." These would be a lot like module packages, but they would be used more like lexical regions. Instead of having multiple exports like a package does, they would typically have a single "main" function that took a qualify function and a unique name and returned an effects value. For a Cene target platform, these could just be distributed as Cene packages, and Cene programs could import them as modules and invoke their "main" functions explicitly. But services would probably be a good semantic model for distributing programs on target platforms other than Cene as well.

Super-brief summary of all action items: Send informative requests to qualify (usually "get me whatever I've declared this should qualify as," but often "get me the place where I can declare what this should qualify as"), and get back qualified places. Have effects-put check that it's given a qualified place (not an unadorned name) so that we can be sure separate runs of Cene will not have effects-put collisions with each other. Use module imports as struct tags.


Obstacles

The one "open problem" below has us stuck at the moment, and all the other thoughts represent a snapshot of our thought process as we scope out how to implement the action items:

  • For the task of making qualify receive an informative struct instead of a name:

    • Hesitation: It's easy -- kinda too easy. Everything we do now would make sense as a "get me whatever I've declared this should qualify as" operation. The only other operation that makes sense is "get me the place where I can declare what this should qualify as." It's not clear if we'll ever need to break this up any further than that, like distinguishing struct main tag name lookups from macro implementation lookups.

    • Musing and more ideas: The idea of lexical regions that qualify their own variables would have been more elegant in a version of Cene that had s-expressions. There, we could process more than one part of the file concurrently, and the parts that relied on a name qualification could wait on the parts that defined how to qualify it. We can do something similar if we have operations which read a section of code balanced brackets and then create another concurrent computation that processes that section for real, but we have to realize that's essentially a code walker. Maybe we should embrace code-walking for this purpose, in which case perhaps all the nontrivial declarations in a file should in fact use code walkers of that style.

  • For the task of making qualify return a place instead of a name:

    • Open problem: Currently we use qualified names directly as local variables. Does it make sense to use places as local variables? Or do we need to have a way to convert a place to a name? Or do we even need to pass local variables to qualify at all? Whatever our approach will be here, how will it look in the long run if we decide local variables should be bundled with local macros, so that we can invoke them lisp-1-style? How will it look if we decide local variables should have places we're recording their static types?

    • Fixes itself: Currently we use a qualified name as part of the name of a struct projection. Replacing this with a place doesn't make sense yet. Fortunately, this probably won't be a problem once we get an implementation worked out for using module imports as struct tags.

  • For the task of using module imports as struct tags:

    • Musing and more ideas: With the idea in mind that structs are like simple tables which take a finite set of nullary structs to arbitrary values, we should probably represent projection names the exact same way we represent nullary struct tags.

    • Hesitation: There's work to do here, and it might not be wise to start until we've thought through the other things. This task involves deciding on a serialization format for module identity data (like UUIDs, public keys, and the variety of struct tags derived from these) and making sure our approaches to module imports and code signing will interact properly.


Stream of consciousness 3:

Actually, we can keep using names for struct tags, in a sense. We just can't embed name values directly into compiled expressions, so we need to embed module imports that resolve to name values instead.

For local variable qualify lookups, we can simply have qualify return a name value instead of a place value for those. Local variable lookups can be their own request type which doesn't go through the "get me whatever I've declared this should qualify as" self-qualification system; this may make it easier to report good error messages for unbound variables.

Now that we think about it some more, it seems we really don't even need to take the self-qualification approach. We can just think of imports as local definitions whose values are obtained from similar names somewhere else.

It turns out there's only one thing most of the topics discussed here revolve around: Each definition should be associated with a particular code-signer who can be blamed if the definitions have a conflict. This means not only that effects-put and effects-get should receive a code-signer identity, but also that we should supply one whenever we define a function implementation, and hence a code-signer identity should be part of what makes up a struct tag.

Let's treat local variables just like the way we treat other identifiers: As macro names. If a binding syntax needs to make a variable be bound in one of its subexpressions, it chooses a unique qualified name and compiles the subexpression using a derived qualify function where the "local variable macro" of the appropriate surface-level name is a macro that creates a cexpr-var of that unique name.


Actionable items for streams of consciousness 1, 2, and 3

We'll briefly mention a few items from "Actionable items for stream of consciousness 1 and 2" so we don't forget them, and then we'll get into items from stream of consciousness 3.

  • (Repeat) A technique to have on hand: Our OWA extensibility can let us implement transitive closure relations, including the well-foundedness ordering we need to impose on computations for our CWA extensibility.

  • (Repeat) Pie in the sky: It's fascinating to think of Cene values that are only partially defined, which we may perhaps "conflate," but we don't need this yet.

  • (Repeat) A solution we don't need yet: We might want a "function-table" data structure to allow us to shadow qualify functions in a way which makes the old ones partially garbage-collectable.

  • (Repeat) Potential design detour: Packages that have a "main" entrypoint (which we could call "services") would be a nice way to think of packages that deploy to target platforms other than Cene.

  • Actionable: Each effects-put, effects-get, and struct tag should be associated with the identity of a particular code-signing party who can be blamed if the definitions have a conflict. (In the case of the struct tag, this party is blamed when conflicting function implementations are defined.) This code-signing system should be designed as an Era-caliber "Meaning-Preserving Modularity" module system, with code-signer identities specified by public key (typically for library modules) or UUIDs (typically for modules built into language implementations) and imports of packages' source code obtained by showing proof-of-private-key (provided by signing the code), a hash, or the full source code itself. We should let Cene builds create quined Cene packages by specifying the set of struct tags whose implementations should be compiled and bundled into the package.

  • Actionable: Let's have "local variable macros." These macros can be introduced in local scopes by swapping out the qualify function when a subexpression is compiled.

@rocketnia
Copy link
Author

I'm now making these public. I considered folding the history into the cene-for-racket repo, but the other Gist they link to is two years old, so it would have been placed in the (JavaScript) cene repo if anything. Instead of choosing a repo for them, I'm just making them both public as Gists.

Perhaps per-repository "notes" directories don't make quite as much sense now that we have more than one Cene implementation codebase. Heck, these are practically Era notes since they're about implementing Meaning-Preserving Modularity. The boundaries between Era projects can be pretty blurry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment