Skip to content

Instantly share code, notes, and snippets.

@sogaiu
Last active October 15, 2025 13:35
Show Gist options
  • Save sogaiu/4531edec9030478db6212ddaaddfb18d to your computer and use it in GitHub Desktop.
Save sogaiu/4531edec9030478db6212ddaaddfb18d to your computer and use it in GitHub Desktop.
reflections on inclusion criteria for grammar orchard

atm inclusion criteria for grammar orchard lists:

Grammars growing in the orchard must:

  1. be released under the MIT license (and possibly other licenses, at the user's option),
  2. have an accompanying test suite,
  3. have a continuous integration workflow configured on the repository,
  4. use C as the programming language of their external scanner (if any), and not C++,
  5. define the grammar in a JavaScript file (and not TypeScript or any other language),
  6. follow the Conventional Commits specification,
  7. follow SemVer: breaking changes must result in a major bump, new features in a minor bump, and bug fixes in a patch bump.
  8. contain CONTRIBUTING.md and GOVERNANCE.md files pointing to our common contributing instructions and governance files respectively
  9. use tree-sitter-$1-orchard as a Rust crate name on crates.io, where $1 is the name of the language parsed by the grammar
  10. do not track in Git the files generated by tree-sitter generate. Those files should instead be generated on the fly during the release process.
  11. Use main as default Git branch

thoughts based on having looked after a few grammars (e.g. tree-sitter-clojure and tree-sitter-janet-simple)

  1. currently using cc0
  2. corpus tests exist but i don't consider those sufficient. the extra tests are performed using separate repositories.
  3. i don't use ci partly because i don't want forge lock-in
  4. when i have an external scanner, it is written in c
  5. i have generated grammar.js files before, but currently don't -- so all of the grammar files are in javascript
  6. i am not familiar with the conventional commits spec and don't currently intend to learn about it
  7. i do not intend to follow semver in any meaningful way, more detail here and opinion from pulsar dev here
  8. not sure what the implications are of this, but i'm not sure i want to buy into more things where the content can change out from under me
  9. i presume it wouldn't be me that does this part - possible issue might be that the janet grammar is named janet_simple (uses an underscore)
  10. until a solution that works for enough existing consumers (e.g. editors such as emacs) is presented, i don't plan to remove generated c files
  11. for historical reasons, the repositories all currently use master

references

overall, the idea (which is not conveyed at all in the document) is to use sufficient standardization of the development practices so that grammar authors can focus on improving their grammar, while other people can maintain the bindings they care about (and to do so for all grammars of the org in a streamlined fashion)

with the intention that downstream users also benefit from that standardization, by having a set of grammars that all use the same ABI, tree-sitter version, are all available in the same package repositories, and so on

I really agree with the point you made on GitHub about most grammar authors not wanting to maintain N bindings and their packaging pipelines… people often write a grammar with a specific downstream use in mind and will often only cater for that one, so don't care about the other bindings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment