Skip to content

Instantly share code, notes, and snippets.

@jaimemarijke
Last active January 5, 2020 20:43
Show Gist options
  • Save jaimemarijke/c07ee8af164bb20f18ae1ec24a65f88b to your computer and use it in GitHub Desktop.
Save jaimemarijke/c07ee8af164bb20f18ae1ec24a65f88b to your computer and use it in GitHub Desktop.

The goal of this document is to set down a shared set of guidelines for writing production code that we all agree to and attempt to follow.

It is a living document 🐢 that can and should change over time as we develop a sense of what it means to write code at Osmo! (Note: this is the final state of the document at Osmo, recorded for posterity as the company was pivoting.)

Code Style

Why do we need Code Style?

Any fool can write code that computers can understand. Good programmers write code that humans can understand. - Martin Fowler

What is code style?

Code style is an agreement on how to format code so that it looks the same no matter who writes it. Some examples of code style rules:

  • Always indent with 4 spaces
  • Always spell out variable names: number not num
  • Standardize on snake_case vs. camelCase in a given code base

Why?

From Quora

  1. Readability: when every dev writes code in the same way, it is easier for everyone to read and edit
  2. Cleanliness: a common code style makes the code feel clean. Working on clean code is like working in a clean kitchen.
  3. Version control: with everybody formatting the code identically, all changes in the files are guaranteed to be meaningful

How?

  • Writing code : every programmer aims to write code that adheres to our style guidelines
  • Code review: code style discrepancies are called out in code review
  • Linting: use linting (see below) to automatically catch most errors

WTF is Linting?

Linting is the process of "applying rules against a codebase and flagging code structures that violate these rules" - How linting and ESLint improve code quality

Specifically, there are linting tools that we can use that automatically check that our code adheres to code style.

Osmo Code Style

General

  • Prefer descriptive variable names and clean code to extensive commenting
    • Comments explain why you're doing something not what you're doing
    • Lean heavily towards spelling out full words in variable names, e.g. number not num
    • Single-letter variable names should be used only for:
      • The iterators i, j when they have no other meaning than as an index when looping through an array
      • The underscore (_) when importing lodash
      • Variables in equations ported to code (though consider whether the equation would be easier to understand if the variable name were more verbose)
  • PascalCase for class names (all languages)
  • ALL_CAPS for module-level constants (all languages)
    • Prefix constant name with an underscore if it is intended to be private to the module
  • Place helper functions before their callers
  • Use a language-specific linter (as specified below)
    • Linting Ignores : Occasionally there will be a special circumstance where it will make sense to break a lint rule. In this case, linters usually provide a way to ignore a rule. Use it and make the ignore as specific as possible. This is cheap to do, makes your intent clearer and makes the ignore easier to maintain:
      • Use this: // eslint-ignore-line no-underscore-dangle
      • Not this: // eslint-ignore-line
      • If you have an extra-extenuating circumstance and can't reasonably use the feature, leave a comment as to why.

Node

  • Version: 8.10
    • (Hitching our cart to the AWS lambda Node version horse)
  • Npm version: 6.x.x

Javascript

Python

  • Version: 3.6
  • Default casing: snake_case
  • Code style: Pep 8, Zen of Python
  • Documentation: use Google Style
  • Linter: flake8 + our own overrides + black
    • Double quotes preferred for strings (because that's what black enforces), but you can use single quotes and let black correct it for you.
    • Max line length:
      • 88 for code (enforced by black)
      • 120 for comments/docstrings (enforced by flake8)
    • Use # fmt on … # fmt off blocks to disable black in extenuating circumstances (usually numpy array declarations)
    • Use # noqa: <error code> human readable explanation to disable specific flake8 errors on single lines (list of error codes)

Jupyter Notebooks (Python)

Coding Best-practices

This is a first-pass attempt to call out some best-practice preferences for how to write good code. It is distinct from code style in that it is broad strategies to be applied, not just specific rules for formatting code.

We aim to keep this up-to-date as we develop shared Osmo preferences and opinions on software development.

  • Be DRY. "Don't repeat yourself".
  • YAGNI. "You ain't gonna need it". Aggressively delete unused code.
  • Occam's Razor /KISS. "Keep it simple stupid". Avoid over-engineering / unnecessary future-proofing. Complexity is the enemy of maintainability. The simplest solution is usually the right one
  • Don't use global variables. If you can possibly avoid it.
    • Jupyter notebooks are a notable exception, but still be careful
  • Avoid mutation. Create new objects rather than updating existing objects in place.
    • In jupyter notebooks, try to confine creation & mutation of a variable to a single cell. Eg. if you have to make a DataFrame and convert a column to dates, do that within the same cell, so that the nature of the DataFrame does not change between cells.
  • Use async await. Because callback hell is hell.
  • [Proposal] Prefer functional over object-oriented

Code Review

Goals

  1. Build maintainable code & prevent bugs
  2. Provide visibility across the team to new code → reduce bus factor
  3. Provide feedback to code author → so they can become a better developer
  4. Provide an opportunity for code reviewer to reflect and think critically about code → so they can become a better developer
  5. Fun (if that's your jam)

Code author

What needs code review?

All new production code should be posted as a Pull Request and go through "code review" before being landed.

"Production" code includes reusable libraries not directly used in production: e.g. osmo-jupyter.

Crafting good Pull Requests

Here's a great Written Unwritten Guide to Pull Requests by Atlassian.

To summarize, using its own headings:

  • (Problem): Reviewing pull requests is hard
  • (Solution):
    • Make smaller pull requests
    • Write useful descriptions and titles
    • Have on-point commit messages
    • Add comments on your pull request to help guide the reviewer

Other things we do in pull requests

  • Testing section: In your description, include information about any manual testing steps you did.
    • Corollary: When reviewing, don't assume that any particular manual testing has been done unless it explicitly says so in the description.
  • Squash + merge: Consider using "squash and merge" to keep our master branch clean - at the risk of losing detailed commit info for your branch

"Guidelines" to merge

  • All devs have gotten a (reasonable) chance to look at it, especially for big PRs
  • 2 approving reviews (rule of thumb)
    • Open to best-judgement, e.g. wordsmithing change doesn't need 2 reviews
    • If any major changes since those two approvals → at least 1 more approval
    • Paired programming: Only 1 approval from an engineer who was not involved with the work.
  • All conversations marked "resolved"
    • For positive feedback / other random notes: Code author can resolve with no response
    • For minor feedback / weak preferences: Code author has executive privilege to resolve conversations by saying something like "I intend to keep it as-is because " and not waiting for a reply. Or author can ask reviewer explicitly to resolve.
    • For major feedback / strong preferences: Code author should implement changes and/or make sure a discussion happens and the reviewer feels heard.

Reviewer

How to review

Check out the Pull Request Dashboard

Priority / cadence of code review

  • Regular cadence
    • 1-2x / day
    • Each dev chooses convenient times that don't interrupt heads down flow time
  • Ok to notify on slack (or not) when you open a new PR
  • If it's urgent, say so on slack
  • Code author re-requests review after done implementing suggested changes
  • When a comment discussion is blocking code author, call out on slack

Indicating severity of feedback

  • Use Major prefixto indicate feedback that you definitely want addressed.
  • Use GitHub's built-in "Changes Requested" feature if you have any comment that is prefixed with Major
  • Use other prefixes as you see fit

Branches

Always use branches and Pull Requests for code changes (unless they are super minor - use judgement)

Branch Naming

  • Prefix with:
    • "f" for features
    • "b" for bugs
    • "hotfix" for hotfixes
  • Use kebab-case

Examples:

  • f/my-descriptive-feature-name
  • b/my-bug-fix
  • hotfix/fixing-things-that-are-on-fire

Documentation

The general aim of documentation is that someone else - or yourself in 3 months - can come see the code and understand it well enough to change it confidently.

Function documentation

Functions should be documented. Use judgement on when to exhaustively follow the standard vs. something simpler.

Auto-generate documentation

[Prototype] We're trying out using sphinx to auto-generate documentation for the osmo-jupyter library. See notes here.

Comments

  • Comments only when it's necessary to explain why you're doing something not what you're doing
  • Prefer descriptive variable names and clean code to extensive commenting (See Code Style above)
  • TODO:
    • Indicates that something should be done around this line. These are mostly useful as notes-to-self when a piece of work is in progress. Many IDEs/editors will turn these into a nice list for you.
    • Don't check in unticketed TODOs to master.
    • Generally avoid commenting with ticketed TODOs as well. Ideally keep discussion and planning in Asana. But if it really makes sense, this is the format to use:
      • TODO ([asana link here]): what needs to be done, eg. implement this function

Automated Testing

In which Jaime scrounged the internet to find good articles already written on this topic.

Why write automated tests?

Paraphrased from JetBrains' Why Write Automated Tests?:

Automated tests are:

  • More than just checking that your code does what you thought it did
  • A way to explore the limitations of your code
  • A way to discover how it behaves under a range of inputs
  • "Document" the expected behaviour (both under normal use and exceptional circumstances)

"Automated tests give you a safety net , giving you confidence to refactor your code, to add features, and to fix defects. When your test suite is all green, you probably didn't break anything"

What's the difference between unit and integration tests?

Excerpts from a great stackoverflow answer:

  • "A unit test is a test written by the programmer to verify that a relatively small piece of code is doing what it is intended to do. They are narrow in scope, they should be easy to write and execute … Unit tests shouldn't have dependencies on outside systems. They test internal consistency as opposed to proving that they play nicely with some outside system."
  • "An integration test is done to demonstrate that different pieces of the system work together. Integration tests cover whole applications, and they require much more effort to put together. … The integration tests do a more convincing job of demonstrating the system works (especially to non-programmers) than a set of unit tests can, at least to the extent the integration test environment resembles production."

Best-practices for unit tests

Follow these: F.I.R.S.T Principles of Unit Testing

Unit test example

Here's an example that demonstrates the FIRST principles (see above), and isn't too simplistic or too complicated.

It is (apparently) testing two functions that convert between integers and roman numerals. The code is in python, but the concepts apply to any language.

Why this is a good example

Adheres to FIRST principles
  • Fast - Code is concise and doesn't have any slow dependencies. It should run quickly
  • Isolated - Code is structured as Arrange → Act → Assert
    • For example, in the test_to_roman_known_values function:
      • Arrange: defines the test data: self.known_values = ...
      • Act: invokes the function under test: result = roman.to_roman(integer)
      • Assert: asserts result is as expected: self.assertEqual(numeral, result)
  • Repeatable - This code is completely standalone - it doesn't require any particular environment
  • Self-validating - Uses python's assert functionality, which will raise errors if the results are wrong
  • Thorough - The tests cover both the "happy path", and many "sad path" edge cases
Well-organized
  • The test file helps me understand the code, without me even knowing what the code is
  • The tests are grouped logically:
    • The functions are correct in "happy path" situations
    • The functions fail correctly in "sad path" situations (e.g. "bad input")
    • The functions are reversible

When to write tests

Writing new code (steady-state)

  • Above all else: use common sense and consider 80/20 rule
  • Unit tests: for functions that have logic
  • Integration tests: sparingly at high level
  • Regression tests: when fixing bugs, write failing test first
  • Exceptions for data processing code:
    • aim for 100% functional coverage
  • Exceptions for things that have UI:
    • Write fewer automated tests and rely more on manual testing
    • Before deploy: manually poke at site to confirm pages load
    • JSX - don't need to write unit tests
      • Write clean components and use PropTypes to prevent most errors
      • Manually verify with your eyeballs that component renders as expected

What to do for existing untested code? (transient)

  • "Testing tax": write tests for existing code before changing it
    • Use common sense
    • Call out in CR

Test frameworks we use and/or like

Javascript

  • Jasmine - for having nice describe/expect syntax
  • Rewire - for accessing non-exported functions
  • Sinon - for stubbing/mocking functions in unit tests

Python

  • pytest
  • For Pandas DataFrames and Series, be aware of testing.assert_series_equal() and testing.assert_frame_equal()

Design spikes

See What is a Design Spike? document.

Principles of repository architecture

This section is relevant only if you might be creating new repositories.

  • Each repo should have a single, specific purpose. It should be obvious where any new piece of code should live
  • Strive towards 12factor app principles in general
    • Specifically config principle: config should live in environment variables
  • Heuristics on when to split:
    • Split if:
      • Builds should be separate (code gets deployed to different places, different languages, etc)
      • A major section of code has consumers outside the repo, but the rest of the repo does not have those consumers. Split these to clearly delineate useful components.
      • ?
    • Keep together if:
      • Code will (almost) always be worked on together
        • Counterpoint: even if this is true, consider splitting if it would vastly clarify and guide us to write better-abstracted code
      • Thing to be split out is too small
      • ?

Versioning

CHANGELOG.md

In packages whose versions we are explicitly managing (using SemVer), create and maintain a CHANGELOG.md, using the Keep a Changelog format.

Javascript: NPM Package Versions and SemVer

Pinning versions

  • Pin to a minimum minor version using "^", e.g. ^5.3.2
  • Do this until we get burned badly enough to want to pin harder

Contributors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment