- GitHub etiquette
- GitHub in a small team
- GitHub: what works for us
- A loose GitHub playbook
- GitHub is a team sport
- This is a talk about what we've found useful as a small (<=6) data science team working with GitHub (might be transferable to GitLab and other services). This is what works for us so far, but isn't gospel. We're still learning. Maybe you'll find it useful.
- This is not a technical talk about how to use Git for version control. It's a talk about planning, workflows, standards and communication. With GitHub.
- Some of these things are flexible and individual choice, but the spirit holds true (e.g. commit messages can 'start with a verb' or be 'conventional', but they should do the job of describing the changes in the commit, regardless).
- Should you do these things if working as a solo developer? I'd say yes. Much of this advice is generally good practice. It's also good to form habits.
As in internal contributor to a repo.
- Make an issue with enough detail. Link to other issues if appropriate. Label the issue. add it to a milestone. Assign someone to the issue if relevant.
- Create a branch. Name it to include the issue number and short description, e.g. '123-add-filter'. Commit to the branch early and often with atomic commits.
- Raise a PR. Link to the issue that you're closing in the form '#123' and explain briefly what the PR does. Assign yourself to the PR and select a reviewer.
- Await review, respond to comments and suggestions. Merge the PR when approved and delete the branch.
- Repos must contain a README to explain the purpose and to explain the requirements and steps to re-run it.
- Repos should be named succinctly but descriptively. Use a prefix for repos if they're part of the same project (e.g. 'nhp_*').
- Every repo should have a named owner and 'deputy' (to reduce 'bus factor').
- The owner is in charge of 'GitHub gardening' (keeping issues in order, labelling, milestones, etc).
- The owner can be auto-selected as the reviewer. We're experimenting with this for repos with external contributors, especially.
- No data is stored in GitHub. Data files are on Azure or Posit Connect.
- Use a .gitignore to exclude likely data files (as well as other unnecessary files). We're thinking about common templates/cookiecutters.
- Issues aren't always 'problems'. They can be reminders (e.g. of tech debt) or questions for further discussion.
- Every ticket should have at least two labels ('enhancement', 'bug', 'documentation', etc, plus MoSCoW). This will help to sort and prioritise them.
- All issues should be assigned to a milestone.
- Issues in milestones should be sorted in priority order/order of expected completion (MoSCoW labels will help with this).
- Use issue templates to ensure particular information is supplied. This is probably most useful for external contributors. We're experimenting with this.
- Refer to other related commits by number (keeps).
- Reopen an issue if the fix did not work as expected. The earlier context is useful if new changes need to be made.
- Feel free to break down an issue into separate issues if needed. You can use the original issue to track the smaller issues if you want.
- Make use of checklists in issues by using markdown notation for checkboxes. This can help keep track of progress. These show up in the issue preview so you can see what percentage of that issue is completed.
- Feel free to 'hide' comments if the information is out of date, etc. This can help keep remove ambiguity and keep conversations current.
- Branch names should be numbered to match their issue, e.g. '123-add-filter'. This makes it obvious what issue is being fixed by that branch and should help identify if more than one person has a branch open for the same issue.
- Only one person works on a branch at a time. This person is the one assigned to the relevant issue.
- If commits from someone else are required, then all parties must communicate about the current state of the branch to ensure they pull changes and avoid merge conflicts.
- Branches are ephemeral and die when the PR is merged. They should be deleted (this can be done automatically). The only branches to exist at all times should be main and a deployment branch, if necessary. All others should be active branches so it's clear what's being worked on.
- Do not commit directly to main. Your work must be independently checked first to limit the chance of mistakes.
- Make your commits small in terms of code and files touched, if possible. This makes the Git history easier to read and makes reviews easier too.
- Commit and push early and often into your branch. This can help others see progress and helps reduce the bus factor.
- Don't dump your work into a commit because it's the end of the day.
- Make your commit messages meaningful. What does the commit do? Start with a verb in present tense ('adds', not 'added'). Or use 'conventional' commits.
- PRs should solve the issue they're related to. Occasionally one fix may solve another.
- They should be named to explain what they do. The issue might be 'the red button doesn't work'; the PR might be 'fix the red button'.
- They should be small in terms of lines of code and files touched. This will make it easier and faster to understand and assess the changes.
- The submitter should mark themself as the 'assignee' and choose a reviewer. You may want to chat with the reviewer to ley them know if they have time.
- For context, link to the issue(s) being closed with the magic words ('closes', 'fixes', etc), which will also close those issues as completed.
- Short explanation or bullet-points of what the PR does. Provide any extra information to make the reviewer's life easier (areas of focus, maybe) or to ask a question about some aspect of what you've written.
- The PR submitter is the one who clicks the merge button. This is in case the submitter realises there's something they need to add or change before the merge.
- Merging or rebasing is not something we worry about too much. Nor is squashing.
- GitHub Actions for continuous integration. R-CMD check at least for R projects. Start with r-lib examples as a basis.
- The reviewer should typically check that the changes result in the issue being fixed. This may require pulling the branch and then testing it, but may not be necessary for small changes.
- The reviewer should seek clarification and add comments where something isn't clear.
- Use 'suggestions' as a reviewer rather than committing to someone else's branch.
- When working at pace (when aren't we?), we should err towards approval if the issue is completed rather than an endless cycle of asking for small changes. The submitter and reviewer should decide whether smaller things like code style or change in approach should be added as a new issue with a 'techdebt' label.
- Tag the history and release on GitHub concurrently to keep them in sync (this is done automatically if the release is done from the GitHub interface).
- Semantic (x.y.z where x is breaking, y is new features and z is patches for bugs).
- We typically just autofill the release description with the constituent PR titles. Which means it's important to give them meaningful names.
- We align releases with sprints, though patches may occur more frequently.
- We link releases to deployment in many cases. Don't release to prod on a Friday, lol.
- We use Trello to plan things and have to link to GitHub repos and issues in Trello cards. Can we use GitHub as our planner across multiple repos instead? Seems possible and we had a demo from a former colleague recently.
- We have much of this recorded on our website, but we could add more to it.
- What do you do? What ideas do you have for us?