This is a sketch of a proposal for a "robots.txt for github" -- a policy that defines what actions automated tooling can take against a given repository.
Bots self-identify, and use project/repo
-style naming. So code that lives at https://github.com/jacobian/coolbot
identifies as jacobian/coolbot
. Forks should generally use the upstream identifier until/unless they become different enough to warrent new names. This is a matter of judgement.
Policies live in .github/robots.yml
. Well-behaved robots should consult this file before taking action.
Somewhat inspired by robots.txt
, but in YAML to troll security researchers. I have no spec yet so here are examples:
Robots may not interact with this repository:
deny: *
Go hog wild:
allow: *
Nobody is welcome except jacobian/coolbot
:
allow:
- jacobian/coolbot
That's the same as:
deny: *
allow:
- jacobian/coolbot
That is, an allow
without a deny
implies deny: *
.
The same is true of a deny list. This allows any bot, except jacobian/coolbot
:
deny:
- jacobian/bot1
and that's the same as:
allow: *
deny:
- jacobian/coolbot
If there's both an allow
and a deny
list, an implicit deny: *
should also be inferred. So given:
allow:
- jacobian/coolbot
deny:
- jacobian/otherbot
jacobian/otherbot
clearly should stay away, but so should jacobian/bot3
and all other bots. The above should be treated as:
allow:
- jacobian/coolbot
deny: *
Bots can also be allowed or denied by organization. This policy welcomes bots from the Python Packaging Authority:
allow:
- pypa/*
This policy welcome most bots, but none made by me:
allow: *
deny:
- jacobian/*
Finally, policies may allow or deny specific actions. This policy allows jacobian/coolbot
any action, and allows PyPA bots to open issues (but only open issues):
allow:
- jacobian/coolbot
- pypa/*@issues
Valid actions are:
- `issues`
- `pull_requests`
TBD: more granular permissions e.g. "open issue", "comment on issue", etc?
I think more than just access, it'd be good to have some kind of mandatory identification if the bot is going to take actions in a repo, which should include or link to some notion of maturity and/or purpose. And maybe require that read-only bots publicly maintain a log of repos they've scanned, though I know enforcement would be difficult.
My motivation is that I was experimented on by UChicago researchers who tested their source analysis tool by opening PRs against random repos without identification or consent. If the maintainer merged their PR, they claimed that as evidence of the tool's efficacy.