The goal of this document is to set down a shared set of guidelines for writing production code that we all agree to and attempt to follow.

It is a living document 🐢 that can and should change over time as we develop a sense of what it means to write code at Osmo! (Note: this is the final state of the document at Osmo, recorded for posterity as the company was pivoting.)

Code Style

Why do we need Code Style?

Any fool can write code that computers can understand. Good programmers write code that humans can understand. - Martin Fowler

What is code style?

Code style is an agreement on how to format code so that it looks the same no matter who writes it. Some examples of code style rules:

Always indent with 4 spaces
Always spell out variable names: number not num
Standardize on snake_case vs. camelCase in a given code base

Why?

From Quora

Readability: when every dev writes code in the same way, it is easier for everyone to read and edit
Cleanliness: a common code style makes the code feel clean. Working on clean code is like working in a clean kitchen.
Version control: with everybody formatting the code identically, all changes in the files are guaranteed to be meaningful

How?

Writing code : every programmer aims to write code that adheres to our style guidelines
Code review: code style discrepancies are called out in code review
Linting: use linting (see below) to automatically catch most errors

WTF is Linting?

Linting is the process of "applying rules against a codebase and flagging code structures that violate these rules" - How linting and ESLint improve code quality

Specifically, there are linting tools that we can use that automatically check that our code adheres to code style.

Osmo Code Style

General

Prefer descriptive variable names and clean code to extensive commenting
- Comments explain why you're doing something not what you're doing
- Lean heavily towards spelling out full words in variable names, e.g. number not num
- Single-letter variable names should be used only for:
  - The iterators i, j when they have no other meaning than as an index when looping through an array
  - The underscore (_) when importing lodash
  - Variables in equations ported to code (though consider whether the equation would be easier to understand if the variable name were more verbose)
PascalCase for class names (all languages)
ALL_CAPS for module-level constants (all languages)
- Prefix constant name with an underscore if it is intended to be private to the module
Place helper functions before their callers
Use a language-specific linter (as specified below)
- Linting Ignores : Occasionally there will be a special circumstance where it will make sense to break a lint rule. In this case, linters usually provide a way to ignore a rule. Use it and make the ignore as specific as possible. This is cheap to do, makes your intent clearer and makes the ignore easier to maintain:
  - Use this: // eslint-ignore-line no-underscore-dangle
  - Not this: // eslint-ignore-line
  - If you have an extra-extenuating circumstance and can't reasonably use the feature, leave a comment as to why.

Node

Version: 8.10
- (Hitching our cart to the AWS lambda Node version horse)
Npm version: 6.x.x

Javascript

Version: ES2017
- Exception: lambda is run on Node v8.10, which essentially has ES2015
Default casing: camelCase
Code style: AirBnb
Documentation:??
Linter: ES Lint + eslint-config-osmo (gently extends eslint-config-airbnb)

Python

Version: 3.6
Default casing: snake_case
Code style: Pep 8, Zen of Python
Documentation: use Google Style
Linter: flake8 + our own overrides + black
- Double quotes preferred for strings (because that's what black enforces), but you can use single quotes and let black correct it for you.
- Max line length:
  - 88 for code (enforced by black)
  - 120 for comments/docstrings (enforced by flake8)
- Use # fmt on … # fmt off blocks to disable black in extenuating circumstances (usually numpy array declarations)
- Use # noqa: <error code> human readable explanation to disable specific flake8 errors on single lines (list of error codes)

Jupyter Notebooks (Python)

See Experiment + Jupyter Notebook Standard Practices

Coding Best-practices

This is a first-pass attempt to call out some best-practice preferences for how to write good code. It is distinct from code style in that it is broad strategies to be applied, not just specific rules for formatting code.

We aim to keep this up-to-date as we develop shared Osmo preferences and opinions on software development.

Be DRY. "Don't repeat yourself".
YAGNI. "You ain't gonna need it". Aggressively delete unused code.
Occam's Razor /KISS. "Keep it simple stupid". Avoid over-engineering / unnecessary future-proofing. Complexity is the enemy of maintainability. The simplest solution is usually the right one
Don't use global variables. If you can possibly avoid it.
- Jupyter notebooks are a notable exception, but still be careful
Avoid mutation. Create new objects rather than updating existing objects in place.
- In jupyter notebooks, try to confine creation & mutation of a variable to a single cell. Eg. if you have to make a DataFrame and convert a column to dates, do that within the same cell, so that the nature of the DataFrame does not change between cells.
Use async await. Because callback hell is hell.
[Proposal] Prefer functional over object-oriented

Code Review

Goals

Build maintainable code & prevent bugs
Provide visibility across the team to new code → reduce bus factor
Provide feedback to code author → so they can become a better developer
Provide an opportunity for code reviewer to reflect and think critically about code → so they can become a better developer
Fun (if that's your jam)

Code author

What needs code review?

All new production code should be posted as a Pull Request and go through "code review" before being landed.

"Production" code includes reusable libraries not directly used in production: e.g. osmo-jupyter.

Crafting good Pull Requests

Here's a great Written Unwritten Guide to Pull Requests by Atlassian.

To summarize, using its own headings:

(Problem): Reviewing pull requests is hard
(Solution):
- Make smaller pull requests
- Write useful descriptions and titles
- Have on-point commit messages
- Add comments on your pull request to help guide the reviewer

Other things we do in pull requests

Testing section: In your description, include information about any manual testing steps you did.
- Corollary: When reviewing, don't assume that any particular manual testing has been done unless it explicitly says so in the description.
Squash + merge: Consider using "squash and merge" to keep our master branch clean - at the risk of losing detailed commit info for your branch

"Guidelines" to merge

All devs have gotten a (reasonable) chance to look at it, especially for big PRs
2 approving reviews (rule of thumb)
- Open to best-judgement, e.g. wordsmithing change doesn't need 2 reviews
- If any major changes since those two approvals → at least 1 more approval
- Paired programming: Only 1 approval from an engineer who was not involved with the work.
All conversations marked "resolved"
- For positive feedback / other random notes: Code author can resolve with no response
- For minor feedback / weak preferences: Code author has executive privilege to resolve conversations by saying something like "I intend to keep it as-is because " and not waiting for a reply. Or author can ask reviewer explicitly to resolve.
- For major feedback / strong preferences: Code author should implement changes and/or make sure a discussion happens and the reviewer feels heard.

Reviewer

How to review

Check out the Pull Request Dashboard

Priority / cadence of code review

Regular cadence
- 1-2x / day
- Each dev chooses convenient times that don't interrupt heads down flow time
Ok to notify on slack (or not) when you open a new PR
If it's urgent, say so on slack
Code author re-requests review after done implementing suggested changes
When a comment discussion is blocking code author, call out on slack

Indicating severity of feedback

Use Major prefixto indicate feedback that you definitely want addressed.
Use GitHub's built-in "Changes Requested" feature if you have any comment that is prefixed with Major
Use other prefixes as you see fit

Branches

Always use branches and Pull Requests for code changes (unless they are super minor - use judgement)

Branch Naming

Prefix with:
- "f" for features
- "b" for bugs
- "hotfix" for hotfixes
Use kebab-case

Examples:

f/my-descriptive-feature-name
b/my-bug-fix
hotfix/fixing-things-that-are-on-fire

Documentation

The general aim of documentation is that someone else - or yourself in 3 months - can come see the code and understand it well enough to change it confidently.

Function documentation

Functions should be documented. Use judgement on when to exhaustively follow the standard vs. something simpler.

Auto-generate documentation

[Prototype] We're trying out using sphinx to auto-generate documentation for the osmo-jupyter library. See notes here.

Comments

Comments only when it's necessary to explain why you're doing something not what you're doing
Prefer descriptive variable names and clean code to extensive commenting (See Code Style above)
TODO:
- Indicates that something should be done around this line. These are mostly useful as notes-to-self when a piece of work is in progress. Many IDEs/editors will turn these into a nice list for you.
- Don't check in unticketed TODOs to master.
- Generally avoid commenting with ticketed TODOs as well. Ideally keep discussion and planning in Asana. But if it really makes sense, this is the format to use:
  - TODO ([asana link here]): what needs to be done, eg. implement this function

Automated Testing

In which Jaime scrounged the internet to find good articles already written on this topic.

Why write automated tests?

Paraphrased from JetBrains' Why Write Automated Tests?:

Automated tests are:

More than just checking that your code does what you thought it did
A way to explore the limitations of your code
A way to discover how it behaves under a range of inputs
"Document" the expected behaviour (both under normal use and exceptional circumstances)

"Automated tests give you a safety net , giving you confidence to refactor your code, to add features, and to fix defects. When your test suite is all green, you probably didn't break anything"

What's the difference between unit and integration tests?

Excerpts from a great stackoverflow answer:

"A unit test is a test written by the programmer to verify that a relatively small piece of code is doing what it is intended to do. They are narrow in scope, they should be easy to write and execute … Unit tests shouldn't have dependencies on outside systems. They test internal consistency as opposed to proving that they play nicely with some outside system."
"An integration test is done to demonstrate that different pieces of the system work together. Integration tests cover whole applications, and they require much more effort to put together. … The integration tests do a more convincing job of demonstrating the system works (especially to non-programmers) than a set of unit tests can, at least to the extent the integration test environment resembles production."

Best-practices for unit tests

Follow these: F.I.R.S.T Principles of Unit Testing

Unit test example

Here's an example that demonstrates the FIRST principles (see above), and isn't too simplistic or too complicated.

It is (apparently) testing two functions that convert between integers and roman numerals. The code is in python, but the concepts apply to any language.

Why this is a good example

Adheres to FIRST principles

Fast - Code is concise and doesn't have any slow dependencies. It should run quickly
Isolated - Code is structured as Arrange → Act → Assert
- For example, in the test_to_roman_known_values function:
  - Arrange: defines the test data: self.known_values = ...
  - Act: invokes the function under test: result = roman.to_roman(integer)
  - Assert: asserts result is as expected: self.assertEqual(numeral, result)
Repeatable - This code is completely standalone - it doesn't require any particular environment
Self-validating - Uses python's assert functionality, which will raise errors if the results are wrong
Thorough - The tests cover both the "happy path", and many "sad path" edge cases

Well-organized

The test file helps me understand the code, without me even knowing what the code is
The tests are grouped logically:
- The functions are correct in "happy path" situations
- The functions fail correctly in "sad path" situations (e.g. "bad input")
- The functions are reversible

When to write tests

Writing new code (steady-state)

Above all else: use common sense and consider 80/20 rule
Unit tests: for functions that have logic
Integration tests: sparingly at high level
Regression tests: when fixing bugs, write failing test first
Exceptions for data processing code:
- aim for 100% functional coverage
Exceptions for things that have UI:
- Write fewer automated tests and rely more on manual testing
- Before deploy: manually poke at site to confirm pages load
- JSX - don't need to write unit tests
  - Write clean components and use PropTypes to prevent most errors
  - Manually verify with your eyeballs that component renders as expected

What to do for existing untested code? (transient)

"Testing tax": write tests for existing code before changing it
- Use common sense
- Call out in CR

Test frameworks we use and/or like

Javascript

Jasmine - for having nice describe/expect syntax
Rewire - for accessing non-exported functions
Sinon - for stubbing/mocking functions in unit tests

Python

pytest
For Pandas DataFrames and Series, be aware of testing.assert_series_equal() and testing.assert_frame_equal()

Design spikes

See What is a Design Spike? document.

Principles of repository architecture

This section is relevant only if you might be creating new repositories.

Each repo should have a single, specific purpose. It should be obvious where any new piece of code should live
Strive towards 12factor app principles in general
- Specifically config principle: config should live in environment variables
Heuristics on when to split:
- Split if:
  - Builds should be separate (code gets deployed to different places, different languages, etc)
  - A major section of code has consumers outside the repo, but the rest of the repo does not have those consumers. Split these to clearly delineate useful components.
  - ?
- Keep together if:
  - Code will (almost) always be worked on together
    - Counterpoint: even if this is true, consider splitting if it would vastly clarify and guide us to write better-abstracted code
  - Thing to be split out is too small
  - ?

Versioning

CHANGELOG.md

In packages whose versions we are explicitly managing (using SemVer), create and maintain a CHANGELOG.md, using the Keep a Changelog format.

Javascript: NPM Package Versions and SemVer

Pinning versions

Pin to a minimum minor version using "^", e.g. ^5.3.2
Do this until we get burned badly enough to want to pin harder

jaimemarijke/Code Style Manifesto.md

Code Style

Why do we need Code Style?

What is code style?

Why?

How?

WTF is Linting?

Osmo Code Style

General

Node

Javascript

Python

Jupyter Notebooks (Python)

Coding Best-practices

Code Review

Goals

Code author

What needs code review?

Crafting good Pull Requests

Other things we do in pull requests

"Guidelines" to merge

Reviewer

How to review

Priority / cadence of code review

Indicating severity of feedback

Branches

Branch Naming

Documentation

Function documentation

Auto-generate documentation

Comments

Automated Testing

Why write automated tests?

What's the difference between unit and integration tests?

Best-practices for unit tests

Unit test example

Why this is a good example

Adheres to FIRST principles

Well-organized

When to write tests

Writing new code (steady-state)

What to do for existing untested code? (transient)

Test frameworks we use and/or like

Javascript

Python

Design spikes

Principles of repository architecture

Versioning

CHANGELOG.md

Javascript: NPM Package Versions and SemVer

Contributors