| author | tym-xqo |
|---|---|
| title | BenchPrep SQL Style Guide |
| revision | 2026-02-05 |
| version | 1.0.0 |
| description | A guide to writing clean, clear, and consistent SQL. |
select *
from benchprep.sql_style_guide as guide
where guide.attributes in ('clean', 'clear', 'consistent')
and guide.look = 'beautiful'These guidelines are designed to make SQL statements easy to write, easy to read, easy to maintain, and beautiful to see. This document is to be used as a guide for anyone who will be authoring SQL at BenchPrep, and serves to codify the team's preferred SQL style.
This guide is opinionated in some areas and relaxed in others. Although grounded in experience, it is unlikely that all the conventions suggested here will be familiar or agreeable to all practitioners. There is no doubt a lot of code committed under the BenchPrep organization that predates the publication of these guidelines and is not consistent with them. Going back to clean all that up will not be priority, but we should strive for consistency when working on new projects or making changes to older ones.
There is a script available that applies these standards automatically, so one option is to compose your queries in whatever manner you prefer locally, and apply the auto-formatter prior to committing.
NOTE: This style guide is written for use with PostgreSQL and Snowflake but much of it can be applied to any SQL database with some simple modifications.
ANOTHER NOTE: SQL examples used in this guide make an effort to refer to real BenchPrep system database table and column names, but they are contrived examples and not necessarily meaningful nor intended to be run against any live database.
- We take a disciplined and practical approach to writing code.
- We treat SQL like any other source code, which should be checked into source control, peer reviewed, and properly maintained.
- We believe consistency in style is important, and we value craftsmanship, but not to the exclusion of other practical concerns.
- We demonstrate intent explicitly in code, via clear structure and comments where needed.
- We adhere to a consistent style for handwritten SQL so that our code can thrive in an environment with many authors, editors, and readers.
Before getting into all the specifics, here is a quick look at some examples showing well formatted, beautiful SQL that matches the recommendations in this style guide:
-- basic select example
select q.name as question_name
, q.question_number
, qc.name as category_name
, q.question_type
, q.elo_rank
from questions as q
join question_categories as qc
on q.question_category_id = qc.id
where q.question_type in ('SMC', 'MMC')
and q.elo_rank < 1500
and qc.name like '%Math%'
order by q.name-- basic insert example
insert into login_trackings (user_id, ip_address, created_at)
values (12345, '192.168.1.1', now())
, (12346, '192.168.1.2', now())-- basic update example
update cp
set cp.price = cp.price * 1.05
, cp.updated_at = now()
from content_packages as cp
where cp.state = 'published'
and cp.published_on is not null-- basic delete example
delete a
from answers as a
where a.deleted_at is not null
and a.updated_at < now() - interval '1 year'-
Favor using a "river" for vertical alignment so that a query can be quickly and easily be scanned by a new reader.
-
Comments should appear at the top of your query or script, and should explain the intent of the query, not the mechanics.
-
Try to comment things that aren't obvious about the query (e.g., why a particular filter is necessary, why an optimization trick was needed, etc.)
-
Inline comments, used sparingly, can help clarify particularly tricky logic that might otherwise be unavoidable (consider refactoring to a clearer case if possible first)
-
Favor being descriptive over terseness:
GOOD:
select u.email as user_email_addressBAD:
select u.email as usreml -
Follow any existing style in the script before applying this style guide. The SQL script should have one clear style, and these rules should not be applied to existing scripts unless the whole script is being changed to adhere to the same style.
-
Favor storing timestamps in timezone-aware data type so that times are clear and convertible. Use ISO-8601 compliant time and date literals (
YYYY-MM-DD HH:MM:SS.SSSSS) when referring to date/time data.
Do not use UPPERCASE or "Sentence case" SQL keywords (e.g., prefer select, not
SELECT or Select). UPPERCASED SQL is an anachronism, and is not appropriate
for modern SQL development. Using lowercase keywords is preferred because:
- UPPERCASE words are harder to type and harder to read.
- SQL syntax is not case-sensitive, and thus lowercase keywords work correctly in all variants of SQL
- No other modern languages use ALLCAPS keywords.
- Modern editors color code SQL keywords, so there is not a need to distinguish keywords by casing.
- If you are in an environment where your keywords are not colored (i.e. as a string in another language), using a river for formatting provides a similar benefit of highlighting important keywords without resorting to CAPS.
- UPPERCASE IS ASSOCIATED WITH SHOUTING WHEN SEEN IN TEXT, IS HARD TO READ, AND MAKES SQL FEEL MORE LIKE COBOL THAN A MODERN LANGUAGE.
If the SQL script you are editing already uses UPPERCASE keywords, match that style or change all keywords to lowercase. Favor bending the rules for the sake of consistency rather than mixing styles.
Both PostgreSQL and Snowflake treat unquoted object names as case-insensitive, but they achieve this differently: PostgreSQL folds unquoted names to lowercase in its system catalog, while Snowflake folds them to UPPERCASE. Unquoted names in queries are folded the same way before comparison, so the difference is invisible in everyday use.
Names introduced in "DoubleQuotes" are stored with their exact casing and require
case-sensitive matching in all future queries. This means a double-quoted name in a
query will only match a default (unquoted) object if it matches the system's folded
case — all-lowercase for PostgreSQL, all-uppercase for Snowflake.
For consistency across both systems, use all-lowercase, unquoted names. This avoids case-sensitivity issues entirely and lets the same queries work against either database (syntax dialect differences aside).
-
Names should be
underscore_separatedand lowercase.GOOD:
select count(*) as the_tally, sum(*) as the_total ...BAD:
select count(*) as The_Tally, sum(*) as theTotal ...
-
Do not use reserved words for table names if possible.
-
Prefer the shortest commonly understood words to name a table.
-
There is some debate about singular versus plural table names, about which people of goodwill might reasonably disagree. The choice being somewhat arbitrary, the most important principle is consistency throughout each schema.
-
ActiveRecord uses plural by default, so we follow that convention in Rails application databases, and their corresponding Data Lake replicas.
-
For new data mart tables or any databases dedicated to Rails applications, prefer singular table names
-
Do not use object prefixes or Hungarian notation (e.g.
sp_,prc_,vw_,tbl_,t_,fn_, etc). -
Tables with semantic prefixes are okay if they aid understanding the nature of a table (e.g. in a Data Warehouse where it is common to use prefixes like
DimandFact). -
Another approved prefix is (
x_) for many-to-many joining tables (cross references) before concatenating table names:GOOD:
x_driver_carBAD:
driver_car -
Avoid giving a table the same name as one of its columns.
-
Tables should always have a primary key. A single column, auto-number (identity) surrogate key is preferable.
-
Composite keys make for verbose and slow foreign key joins.
int/bigintprimary keys are optimal as foreign keys when a table gets large. -
Tables should always have
created_atandupdated_atmetadata fields in them to make data movement between systems easier (ETL). Also, consider storing deleted records in archival tables, or having adeleted_atfield for soft deletes. -
Don't forget the needs of data analysts and ETL developers when designing your model.
- Do not use reserved words for column names if possible.
- Prefer not simply using
idas the name of the primary identifier for the table if possible. - Do not add a column with the same name as its table and vice versa.
- Avoid common words like
name,description, etc. Prefer a descriptive prefix for those words so that they don't require aliases when joined to other tables with similarly named columns. - Do not use
descas an abbreviation fordescription. Spell it out, or use some other non-keyword.
- Aliases should relate in some way to the object or expression they are aliasing.
- As a rule of thumb the alias can be the first letter of each word in the object's name or a good abbreviation.
- If there is already an alias with the same name then append a number.
- When using a subquery, prefix aliases with an
_to differentiate them from aliases in the outer query. - Always include the
askeyword. It makes the query easier to read and is explicit. - For computed data (i.e.
sum()oravg()) use the name you would give it were it a column defined in the schema.
- No tabs. Use spaces for indents.
- Configure your editor to 4 spaces per indent, but prefer your SQL to indent to the "river", and not to a set indent increment.
- No trailing whitespace.
- No more than two blank lines between statements.
- No empty lines in the middle of a single statement.
- One final newline at the end of a file
- Use an .editorConfig file to enforce reasonable whitespace rules if your SQL editor supports it:
# .editorConfig is awesome: https://EditorConfig.org
# SQL files
[*.{sql,tsql,ddl}]
charset = utf-8
indent_style = space
indent_size = 4
end_of_line = crlf
trim_trailing_whitespace = true
insert_final_newline = trueSpaces may be used to line up the code so that the root keywords all end on the
same character boundary. This forms a "river" down the middle making it easy for
the reader's eye to scan over the code and separate the keywords from the
implementation detail. Rivers are bad in typography, but helpful here.
Celko's book describes using a river to vertically align your query.
Right align keywords to the river if you chose to use one. The on clause in
the from may have its own river to help align information vertically.
Subqueries should create their own river as well.
-- a river in the 7th column helps vertical readability
select q.name as question_name
, q.elo_rank
, q.question_type
, qc.name as category_name
, s.name as section_name
from questions as q
left join question_categories as qc
on q.question_category_id = qc.id
left join sections as s
on qc.section_id = s.id
where q.elo_rank <= 1500
and q.id not in (
select _a.question_id
from answers _a
where _a.created_at < now() - interval '1 year'
)
and q.question_type in ('SMC', 'MMC', 'TF')
order by q.elo_rank desc, q.name-- alternately, a river in the a different column is fine if that is preferred
-- due to longer keywords, but know that indenting can feel "off" if the
-- `select` is not in the first column for the query
select q.name as question_name
, q.elo_rank
, q.question_type
, qc.name as category_name
, s.name as section_name
from questions as q
left join question_categories as qc
on q.question_category_id = qc.id
left join sections as s
on qc.section_id = s.id
where q.elo_rank <= 1500
and q.id not in (
select _a.question_id
from answers _a
where _a.created_at < now() - interval '1 year'
)
and q.question_type in ('SMC', 'MMC', 'TF')
order by q.elo_rank desc, q.nameUsing a river can be tedious, so if this alignment is not preferred by your team, then a standard 4 space indent can be used in place of a river.
Major keywords starting a clause should occupying their own line. Major keywords are:
- Select statement
selectintofromwheregroup byhavingorder by
- Insert statement additions
insert intovalues
- Update statement additions
updateset
- Delete statement additions
delete
All other keywords are minor and should appear after the indent and not occupy a line to themselves. Other than this section, this guide will stick to showing "river" formatting examples.
-- Editors tend to handle indenting style better than river alignment. River
-- formatting has advantages over indent formatting, but this style is
-- acceptable.
select
q.name as question_name
,q.elo_rank
,q.question_type
,qc.name as category_name
,s.name as section_name
from
questions as q
left join question_categories as qc
on q.question_category_id = qc.id
left join sections as s
on qc.section_id = s.id
where
q.elo_rank <= 1500
and q.question_type in ('SMC', 'MMC', 'TF')
order by
q.elo_rank desc, q.nameSelect the first column on the same line, and align all subsequent columns after the first get their own line.
select q.question_type
, qc.name as category_name
, count(*) as question_count
from ...If three or fewer columns are selected, have short names, and don't need aliased, you may chose to have them occupy the same line for brevity.
-- shortcut for small columns
select q.question_type, qc.name, q.elo_rank
from ...If using a select modifier like distinct or top, put the first column
on its own line.
-- treat the first column differently when using distinct and top
select distinct
u.email
, e.state as enrollment_state
from ...Use commas as a prefix as opposed to a suffix. This is preferred because:
- It makes it easy to add new columns to the end of the column list, which is more common than at the beginning
- It prevents unintentional aliasing bugs (missing comma)
- It makes commenting out columns at the end easier
- When statements take multiple lines like windowing functions and
casestatements, the prefix comma makes it clear when a new column starts - It does not adversely affect readability
The comma should border the "river" on the keyword side.
GOOD:
select Name
, ListPrice
, Color
, CategoryName
...BAD:
-- whoops! forgot a trailing comma because it's hard to see, making an
-- accidental alias of `ListPrice Color`
select Name,
ListPrice
Color,
CategoryName
...Always use as to rename columns. as statements can be used for additional
vertical alignment but don't have to be:
GOOD:
select q.question_type as question_type
, qc.name as category_name
, count(*) as question_count
from ...
...BAD:
select q.question_type question_type
, qc.name category_name
, count(*) question_count
from ...
...Always rename aggregates, derived columns (e.g. case statements), and
function-wrapped columns:
select q.name
, avg(a.time_taken) as avg_time_taken
, now() as current_timestamp
from ...Always use table alias prefixes for all columns when querying from more than one table. Single character aliases are fine for a few tables, but are less likely to be clear as a query grows:
select q.question_type
, qc.name as category_name
, count(*) as question_count
from questions as q
left join question_categories as qc
on ...Do not bracket-escape table or column names unless the names contain keyword collisions or would cause a syntax error without properly qualifying them.
GOOD:
-- user is a keyword in some contexts
select name
, email
, state
from usersBAD:
-- extra quotes are messy and unnecessary
select "name"
, "email"
, "state"
from "users"Long Window functions should be split across multiple lines: one for each
clause, aligned with a river. Partition keys can share the same line, or be
split. Ascending order is an intuitive default and thus using an explicit asc
is not necessary whereas desc is. All window functions should be aliased.
select q.id
, q.name as question_name
, q.question_number
, q.question_type
, row_number() over (partition by q.question_type
, q.question_category_id
order by q.elo_rank desc) as rank_within_category
, q.elo_rank
from questions q
order by q.question_type
, q.question_category_id
, rank_within_categorycase statements aren't always easy to format but try to align when, then,
and else together inside case and end.
then can stay on the when line if needed, but aligning with else is
preferable.
select cp.title as course_title
, case when cp.state in ('published', 'active')
then 'available'
else cp.state
end as availability_status
from content_packages as cp
order by availability_status, course_titleOnly one table should be in the from part. Never use comma separated
from-joins:
GOOD:
select u.email
, cp.title as course_title
from users as u
join enrollments as e
on u.id = e.user_id
join content_packages as cp
on e.content_package_id = cp.id
...BAD:
select u.email
, cp.title as course_title
from users as u, enrollments as e, content_packages as cp
where u.id = e.user_id
and e.content_package_id = cp.id
...Favor not using the extraneous words inner or outer when joining tables.
Alignment is easier without them, they don't add to the understanding of the
query, and the full table list is easier to scan without excessive staggering:
GOOD:
-- this is easier to format and read
select *
from users as u
join enrollments as e
on u.id = e.user_id
left join content_packages as cp
on e.content_package_id = cp.id
left join exam_results as er
on e.id = er.content_package_idBAD:
-- verbosity for the sake of verbosity is not helpful
-- `join` by itself always means `inner join`
-- `outer` is an unnecessary optional keyword
select *
from users as u
inner join enrollments as e
on u.id = e.user_id
left outer join content_packages as cp
on e.content_package_id = cp.id
left outer join exam_results as er
on e.id = er.content_package_idThe on keyword and condition can go on its own line, but is easier to scan if
it lines up on the join line. This is an acceptable style alternative:
-- this is an easier format to scan visually, but comes at the cost of longer
-- lines of code.
select *
from users as u
join enrollments as e on u.id = e.user_id
left join content_packages as cp on e.content_package_id = cp.id
left join exam_results as er on e.id = er.content_package_id
...Additional filters in the join go on new indented lines. Line up using the
on keyword:
GOOD:
select u.name
from users as u
left join enrollments as e
on u.id = e.user_id
left join content_packages as cp
on e.content_package_id = cp.id
and cp.state = 'published' -- multi-conditions start a new line
where e.id is nullBAD:
select u.name
from users as u
left join enrollments as e
on u.id = e.user_id
left join content_packages as cp
on e.content_package_id = cp.id and cp.state = 'published' -- needs a new line
where e.id is nullBegin with inner joins and then list left joins, order them semantically,
and do not intermingle left joins with inner joins unless necessary. Order
the on clause with joining aliases referencing tables top-to-bottom:
GOOD:
select *
from questions as q
join question_categories as qc
on q.question_category_id = qc.id
left join sections as s
on qc.section_id = s.id
left join answers as a
on q.id = a.question_idBAD:
select *
from questions as q
left join sections as s
on s.id = qc.section_id -- backwards
join question_categories as qc -- intermingled
on qc.id = q.question_category_id -- backwards
left join answers as a
on q.id = a.question_idAvoid right joins as they are usually better written with a left join
GOOD:
select *
from questions as q
left join question_categories as qc
on ...BAD:
select *
from question_categories as qc
right join questions as q
on ...Multiple where clauses should go on different lines and align to the river:
select *
from questions q
where q.elo_rank > 1200
and q.elo_rank < 1800
and question_type in ('SMC', 'MMC', 'TF')
and created_at >= '2020-01-01'
...When mixing and and or statements, do not rely on order of operations and
instead always use parenthesis to make the intent clear:
select *
from questions q
where (q.elo_rank > 1500
and question_type in ('SMC', 'MMC'))
or question_type is nullAlways put a semicolon on its own line when using them. This prevents common
errors like adding conditions to a where clause and neglecting to move the
trailing semicolon:
GOOD:
-- The prefix semicolon is clear and easy to spot when adding to a `where`
delete a
from answers a
where a.deleted_at is not null
and a.correct is null
and a.user_answer is null
;
...BAD:
-- A trailing semicolon is sinister.
-- We added some where conditions and missed it.
-- This is a destructive bug.
delete a
from answers a
where a.deleted_at is not null; -- dangerous
and a.correct is null -- syntax error here, but the bad delete is valid
and a.user_answer is null
...Maintain the same column order as the select clause in the group by:
GOOD:
select e.user_id
, e.content_package_id
, count(*) as exam_count
, avg(er.percentage_correct) as avg_score
from enrollments as e
join exam_results as er
on e.user_id = er.user_id
group by e.user_id
, e.content_package_idBAD:
-- messing with the 'group by' order makes it hard to scan for accuracy
select e.user_id
, e.content_package_id
, count(*) as exam_count
, avg(er.percentage_correct) as avg_score
from enrollments as e
join exam_results as er
on e.user_id = er.user_id
group by e.content_package_id -- out of order
, e.user_idA having clause is just a where clause for aggregate functions. The same
rules for where clauses apply to having.
Example:
select e.user_id
, e.content_package_id
, count(*) as exam_count
, avg(er.percentage_correct) as avg_score
from enrollments as e
join exam_results as er
on e.user_id = er.user_id
group by e.user_id
, e.content_package_id
having count(*) > 1
and avg(er.percentage_correct) > 70.0Do not use the superfluous asc in order by statements:
GOOD:
-- asc is implied and obvious
select u.name
, u.email
from users u
order by u.name
, u.emailBAD:
-- asc is clutter - it's never ambiguous when you wanted to sort ascending
select u.name
, u.email
from users u
order by u.name asc -- useless asc
, u.email ascOrdering by column number is okay, but not preferred:
-- This is okay, but not great.
select u.name as user_name
, u.email as user_email
from users u
order by 2The by keyword can sit on the other side of a 7th column river, but align
the order by columns:
select u.name
, u.email
from users u
order by u.name
, u.emailIf three or fewer columns are in the order by and have short names you may
chose to have them occupy the same line for brevity.
-- shortcut for small columns
select u.name, u.email
from users u
order by u.name, u.email