| layout | title | description | tags | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
default |
SQL Style Guide |
A guide to writing clean, clear, and consistent SQL. |
|
This document is written as a manual for anyone working on the Data team, but also as a guide for anyone at the company who would like to write clean and clear SQL code that is meant to be shareable and maintainable.
NOTE: This style guide is written for use with AWS Redshift and Postgres 8.0.2, but much of it can be applied to any SQL database.
Attribution This style guide is forked from Fred Benenson: Kickstarter Style Guide. Many thanks to him and his team for their initial efforts.
- We take a disciplined and practical approach to writing code. (And SQL is code.)
- We regularly check-in code to Github
- We believe consistency in style is important for readability and maintainability.
- We demonstrate intent with code via clear structure and naming conventions, using comments only where needed.
-
No tabs. 2 spaces per indent.
-
No trailing whitespace.
-
No backticks (e.g., `this_value_has_backticks_but_shouldnt`) as this breaks certain SQL dialects.
-
Always capitalize SQL keywords (e.g.,
SELECTorAS) -
Variable names should be underscore separated (a.k.a. snake_case):
GOOD:
SELECT COUNT(*) AS assignment_count FROM ...
BAD:
SELECT COUNT(*) AS assignmentCount FROM ...
-
Comments should go near the top of your query, or at least near the closest
SELECT -
Try to only comment on things that aren't obvious about the query (e.g., why a particular ID is hardcoded, etc.)
-
Don't use single letter variable names be as descriptive as possible given the context:
GOOD:
SELECT marketcore.users AS workers FROM ...
BAD:
SELECT marketcore.users AS w FROM ...
-
Use Common Table Expressions (CTEs) early and often, and name them well.
-
Unfortunately, CTEs are not supported in MySQL, so you'll just have to suffer.
-
HAVINGisn't supported in Redshift, so use CTEs instead.
Align all columns to the first column on their own line:
SELECT
users.name,
users.email,
address.country,
COUNT(login_info.id) AS login_count
FROM ...SELECT goes on its own line:
SELECT
name,
...Always rename aggregates and function-wrapped columns:
SELECT
name,
SUM(amount) AS sum_amount
FROM ...Always rename all columns when selecting with table aliases:
SELECT
work.name AS assignment_title,
COUNT(workers.id) AS work_invites_count
FROM marketcore.work_resource AS work_invites
INNER JOIN marketcore.users AS workers ON ...Always use AS to rename columns:
GOOD:
SELECT
projects.name AS project_name,
COUNT(workers.id) AS workers_count
...BAD:
SELECT
projects.name project_name,
COUNT(workers.id) workers_count
...Long Window functions should be split across multiple lines: one for the PARTITION, ORDER and frame clauses, aligned to the PARTITION keyword. Partition keys should be one-per-line, aligned to the first, with aligned commas. Order (ASC, DESC) should always be explicit. All window functions should be aliased.
SUM(1) OVER (PARTITION BY industry_id,
year
ORDER BY created_on DESC
ROWS UNBOUNDED PRECEDING) AS industry_yearOnly one table should be in the FROM. Never use FROM-joins:
GOOD:
SELECT
jobs.title AS job_title,
COUNT(applications.id) AS applications_count
FROM warp.jobs AS jobs
INNER JOIN warp.applications AS applications ON applications.job_id = jobs.id
...BAD:
jobs.title AS job_title,
COUNT(applications.id) AS applications_count
FROM warp.jobs AS jobs
INNER JOIN warp.applications AS applications
WHERE
applications.job_id = jobs.id
...Explicitly use INNER JOIN not just JOIN, making multiple lines of INNER JOINs easier to scan:
GOOD:
SELECT
...
FROM warp.jobs AS jobs
INNER JOIN warp.recruiter AS recruiters ON ...
INNER JOIN ...
LEFT JOIN warp.applications AS applications ON ...
LEFT JOIN ...BAD:
SELECT
...
FROM warp.jobs AS jobs
JOIN warp.recruiter AS recruiters ON ...
INNER JOIN ...
LEFT JOIN warp.applications AS applications ON ...
LEFT JOIN ...Additional filters in the INNER JOIN go on new indented lines:
SELECT
...
FROM warp.jobs AS jobs
INNER JOIN warp.applications AS applications ON applications.job_id = jobs.id
AND applications.rejected IS NOT NULL
...The ON keyword and condition goes on the INNER JOIN line:
SELECT
projects.name AS project_name,
COUNT(backings.id) AS backings_count
FROM ksr.projects AS projects
INNER JOIN ksr.backings AS backings ON projects.id = backings.project_id
...Begin with INNER JOINs and then list LEFT JOINs, order them semantically, and do not intermingle LEFT JOINs with INNER JOINs unless necessary:
GOOD:
INNER JOIN ksr.backings AS backings ON ...
INNER JOIN ksr.users AS users ON ...
INNER JOIN ksr.locations AS locations ON ...
LEFT JOIN ksr.backer_rewards AS backer_rewards ON ...
LEFT JOIN ...BAD:
LEFT JOIN ksr.backer_rewards AS backer_rewards ON backings
INNER JOIN ksr.users AS users ON ...
LEFT JOIN ...
INNER JOIN ksr.locations AS locations ON ...Multiple WHERE clauses should go on different lines and begin with the SQL operator:
SELECT
name,
goal
FROM ksr.projects AS projects
WHERE
country = 'US'
AND deadline >= '2015-01-01'
...You may also declare WHERE TRUE to normalize the filters' layout for easy adjustment:
SELECT
name,
goal
FROM ksr.projects AS projects
WHERE TRUE
-- AND company_id <> 1
AND country = 'US'
AND deadline >= '2015-01-01'
...CASE statements aren't always easy to format but try to align WHEN, THEN, and ELSE together inside CASE and END:
CASE WHEN category = 'Art'
THEN backer_id
ELSE NULL
END
WITHclause subqueries are an efficient way of defining tables that can be used throughout the execution of a single query. In all cases, the same results can be achieved by using subqueries in the main body of theSELECTstatement, butWITHclause subqueries may be simpler to write and read.
The body of a CTE must be one indent further than the WITH keyword. Open them at the end of a line and close them on a new line:
WITH backings_per_category AS (
SELECT
category_id,
deadline,
...
)Multiple CTEs should be formatted accordingly:
WITH backings_per_category AS (
SELECT
...
), backers AS (
SELECT
...
), backers_and_creators AS (
...
)
SELECT * FROM backers;If possible, JOIN CTEs inside subsequent CTEs, not in the main clause:
GOOD:
WITH backings_per_category AS (
SELECT
...
), backers AS (
SELECT
backer_id,
COUNT(backings_per_category.id) AS projects_backed_per_category
INNER JOIN ksr.users AS users ON users.id = backings_per_category.backer_id
), backers_and_creators AS (
...
)
SELECT * FROM backers_and_creators;BAD:
WITH backings_per_category AS (
SELECT
...
), backers AS (
SELECT
backer_id,
COUNT(backings_per_category.id) AS projects_backed_per_category
), backers_and_creators AS (
...
)
SELECT * FROM backers_and_creators
INNER JOIN backers ON backers_and_creators ON backers.backer_id = backers_and_creators.backer_idAlways use CTEs over inlined subqueries.
- Sublime is your friend. Configure it to use soft tabs (e.g. 2 spaces), and trim trailing whitespace
- Helpful Sublime packages include Githubinator, SendText, and Package Control.
- Check code into github early and often.
- Always provide a Githubinator permalink in Trello cards where any code is used.