This took me entirely too long, and I feel like I should have just waved off a while ago.
# I want to setup a github cache so I can write dumb scripts
go get k8s.io/test-infra/ghproxy
# `hub` respects `HTTP_PROXY` env var, but `ghproxy` doesn't act like that kind of proxy
# Instead, we'll pretend like `ghproxy` is serving up a GitHub Enterprise endpoint
# and use the `GITHUB_HOST` env var to talk to it
# `hub` expects to talk to an https api endpoint, but `ghproxy` listens on http
# we'll use `secure` to setup a https->http proxy
go get github.com/yi-jiayu/secure
# `hub` won't accept self-signed certs, install a local cert authority
brew install mkcert
mkcert install
# `hub` appends "/api/v3" if the endpoint host isn't prefixed with api.github.
# so let's use api.github.localhost
sudo echo "127.0.0.7 api.gitub.localhost" >> /etc/hosts
# run it all in one terminal window...
cd ~/sandbox/local-ghproxy
mkcert api.github.localhost
ghproxy --cache-dir ./cache --cache-sizeGB 2 --log-level debug --port 8888 &
secure -key api.github.localhost-key.pem -cert api.github.localhost.pem -addr localhost:8443 http://localhost:8888NOTE: editing /etc/hosts is bad, something like https://medium.com/@kharysharpe/automatic-local-domains-setting-up-dnsmasq-for-macos-high-sierra-using-homebrew-caf767157e43 might be better, but I was fed up with adding more moving parts by this point
Then finally, at last, I can
export GITHUB_HOST=api.github.localhost:8443
hub api /repos/kubernetes/community/pullsLet's see if this actually does anything for a dumb script. It gets up to 100 PR's, and lists the files for each PR.
# spiffxp@spiffxp-macbookpro:hack (do-things-with-sigs %)$ time ./output-pr-files-csv.py
# without GITHUB_HOST
real 0m46.481s
real 0m40.835s
real 0m49.166s
real 0m51.528s
# with GITHUB_HOST exported
real 1m11.651s
real 0m55.358s
real 0m58.174s
real 1m7.620s
The moral of the story is I thought setting up a github cache locally would allow me to write
dumb scripts that didn't need to do any caching of their own. It turns out not to really save
me any time, and I had to jump through a number of hoops to accomodate hub. In retrospect,
I guess this is because the cache's main purpose is to help us conserve tokens more than it is
to save us roundtrip time upstream.
Further ideas:
- I'm not actually seeing confirmation of whether I'm avoiding upstream
- maybe I can tune
ghproxyto avoid talking upstream more aggressively - maybe I can setup some other more aggressive cache
- results may be thrown off by the fact that I was running a
bazelbuild concurrently, but I'm doubtful. - results may also be thrown off by using an on-disk cache instead of in-memory, but again, doubtful.
Lessons learned:
- had I not stuck to using
hubI would have been able to talk http to ghproxy directly and come to this conclusion sooner - maybe I should just continue sprinkling the "check for files first, otherwise hit the api" pattern that I've been writing piecemeal in my silly scripts