I'll note a problem, and a solution I wanted to share, and speak a little more broadly...
The problem was a workflow. A Digital Humanities Librarian and her graduate-students would intermittently work on a faculty project, which, at its core, has a website showing translated inscriptions. Their xml data was stored in a private subversion repository. I had set up a script to periodically check to see if inscriptions in the subversion repository had been updated, and, if so, to then process them for display. Many months could go by without any data changes being made, followed by brief bursts of updates. So the problem was, I tended to want my script to check for changes relatively infrequently, but during work-bursts, the data-updaters wanted to see their changes flow into the website as immediately as possible.
We tried a couple things, but the eventual solution: we've moved the xml data-files to a public github repository, and use github's webhooks feature. This allowed us to move from polling for changes, to changes automatically triggering immediate updates to the website.
And setting up a webhook is easy. An admin for the repository can specify a url that github will post to on a commit. You can use the http-basic-auth url syntax https://username:[email protected]/etc/ to force your listening web-app to perform a basic-auth check to ensure the post is really from github. The payload github posts contains nice json data of files created, modified, and deleted. Then you can build-out your listening web-app to process the changes.
This has been really successful, and I've added this trigger-on-commit architecture to other projects. One example of our listen-and-process code is at: https://github.com/Brown-University-Library/iip_processing_project/ (or just search on 'processing' in Brown's github area).
We've added a couple of other nice features to the listen-and-process application -- like queueing, passing on the github-post to a dev-server for development-testing, and a process-status viewer...
But the real point I wanted to convey is the benefit of a shift in thinking from github's features being useful just to developers -- to github's features also being useful to worfklows and needs by folk creating and updating data.
More broadly, along these lines...
- The Digital Humanities Librarian and her grad-students use github's 'issues' feature extensively to communicate with one-another
- They find having an accessible color-coded view of their xml-files useful, with urls to lines and sections and versions
- And there are other backend features I want to explore that will help their workflow, such as using github's ability to integrate with other services like Travis -- to validate the data files before they're committed
Thanks.
Links...
- data-files
- example link showing url to section
- processing-code
- slides