author	tym-xqo
title	BenchPrep SQL Style Guide
revision	2026-02-05
version	1.0.0
description	A guide to writing clean, clear, and consistent SQL.

BenchPrep SQL Style Guide

select *
  from benchprep.sql_style_guide as guide
 where guide.attributes in ('clean', 'clear', 'consistent')
   and guide.look = 'beautiful'

Purpose

These guidelines are designed to make SQL statements easy to write, easy to read, easy to maintain, and beautiful to see. This document is to be used as a guide for anyone who will be authoring SQL at BenchPrep, and serves to codify the team's preferred SQL style.

This guide is opinionated in some areas and relaxed in others. Although grounded in experience, it is unlikely that all the conventions suggested here will be familiar or agreeable to all practitioners. There is no doubt a lot of code committed under the BenchPrep organization that predates the publication of these guidelines and is not consistent with them. Going back to clean all that up will not be priority, but we should strive for consistency when working on new projects or making changes to older ones.

There is a script available that applies these standards automatically, so one option is to compose your queries in whatever manner you prefer locally, and apply the auto-formatter prior to committing.

NOTE: This style guide is written for use with PostgreSQL and Snowflake but much of it can be applied to any SQL database with some simple modifications.

ANOTHER NOTE: SQL examples used in this guide make an effort to refer to real BenchPrep system database table and column names, but they are contrived examples and not necessarily meaningful nor intended to be run against any live database.

Principles

We take a disciplined and practical approach to writing code.
We treat SQL like any other source code, which should be checked into source control, peer reviewed, and properly maintained.
We believe consistency in style is important, and we value craftsmanship, but not to the exclusion of other practical concerns.
We demonstrate intent explicitly in code, via clear structure and comments where needed.
We adhere to a consistent style for handwritten SQL so that our code can thrive in an environment with many authors, editors, and readers.

Quick look

Before getting into all the specifics, here is a quick look at some examples showing well formatted, beautiful SQL that matches the recommendations in this style guide:

-- basic select example
select q.name as question_name
     , q.question_number
     , qc.name as category_name
     , q.question_type
     , q.elo_rank
  from questions as q
  join question_categories as qc
    on q.question_category_id = qc.id
 where q.question_type in ('SMC', 'MMC')
   and q.elo_rank < 1500
   and qc.name like '%Math%'
 order by q.name

-- basic insert example
insert into login_trackings (user_id, ip_address, created_at)
values (12345, '192.168.1.1', now())
     , (12346, '192.168.1.2', now())

-- basic update example
update cp
   set cp.price = cp.price * 1.05
     , cp.updated_at = now()
  from content_packages as cp
 where cp.state = 'published'
   and cp.published_on is not null

-- basic delete example
delete a
  from answers as a
 where a.deleted_at is not null
   and a.updated_at < now() - interval '1 year'

Rules

General guidance

Favor using a "river" for vertical alignment so that a query can be quickly and easily be scanned by a new reader.
Comments should appear at the top of your query or script, and should explain the intent of the query, not the mechanics.
Try to comment things that aren't obvious about the query (e.g., why a particular filter is necessary, why an optimization trick was needed, etc.)
Inline comments, used sparingly, can help clarify particularly tricky logic that might otherwise be unavoidable (consider refactoring to a clearer case if possible first)
Favor being descriptive over terseness:

GOOD: select u.email as user_email_address

BAD: select u.email as usreml
Follow any existing style in the script before applying this style guide. The SQL script should have one clear style, and these rules should not be applied to existing scripts unless the whole script is being changed to adhere to the same style.
Favor storing timestamps in timezone-aware data type so that times are clear and convertible. Use ISO-8601 compliant time and date literals (YYYY-MM-DD HH:MM:SS.SSSSS) when referring to date/time data.

SQL Keyword Casing

Do not use UPPERCASE or "Sentence case" SQL keywords (e.g., prefer select, not SELECT or Select). UPPERCASED SQL is an anachronism, and is not appropriate for modern SQL development. Using lowercase keywords is preferred because:

UPPERCASE words are harder to type and harder to read.
SQL syntax is not case-sensitive, and thus lowercase keywords work correctly in all variants of SQL
No other modern languages use ALLCAPS keywords.
Modern editors color code SQL keywords, so there is not a need to distinguish keywords by casing.
If you are in an environment where your keywords are not colored (i.e. as a string in another language), using a river for formatting provides a similar benefit of highlighting important keywords without resorting to CAPS.
UPPERCASE IS ASSOCIATED WITH SHOUTING WHEN SEEN IN TEXT, IS HARD TO READ, AND MAKES SQL FEEL MORE LIKE COBOL THAN A MODERN LANGUAGE.

If the SQL script you are editing already uses UPPERCASE keywords, match that style or change all keywords to lowercase. Favor bending the rules for the sake of consistency rather than mixing styles.

SQL Object Name Casing

Both PostgreSQL and Snowflake treat unquoted object names as case-insensitive, but they achieve this differently: PostgreSQL folds unquoted names to lowercase in its system catalog, while Snowflake folds them to UPPERCASE. Unquoted names in queries are folded the same way before comparison, so the difference is invisible in everyday use.

Names introduced in "DoubleQuotes" are stored with their exact casing and require case-sensitive matching in all future queries. This means a double-quoted name in a query will only match a default (unquoted) object if it matches the system's folded case — all-lowercase for PostgreSQL, all-uppercase for Snowflake.

For consistency across both systems, use all-lowercase, unquoted names. This avoids case-sensitivity issues entirely and lets the same queries work against either database (syntax dialect differences aside).

Naming guidance

Names should be underscore_separated and lowercase.

GOOD: select count(*) as the_tally, sum(*) as the_total ...

BAD: select count(*) as The_Tally, sum(*) as theTotal ...

Tables

Do not use reserved words for table names if possible.
Prefer the shortest commonly understood words to name a table.
There is some debate about singular versus plural table names, about which people of goodwill might reasonably disagree. The choice being somewhat arbitrary, the most important principle is consistency throughout each schema.
ActiveRecord uses plural by default, so we follow that convention in Rails application databases, and their corresponding Data Lake replicas.
For new data mart tables or any databases dedicated to Rails applications, prefer singular table names
Do not use object prefixes or Hungarian notation (e.g. sp_, prc_, vw_, tbl_, t_, fn_, etc).
Tables with semantic prefixes are okay if they aid understanding the nature of a table (e.g. in a Data Warehouse where it is common to use prefixes like Dim and Fact).
Another approved prefix is (x_) for many-to-many joining tables (cross references) before concatenating table names:

GOOD: x_driver_car

BAD: driver_car
Avoid giving a table the same name as one of its columns.
Tables should always have a primary key. A single column, auto-number (identity) surrogate key is preferable.
Composite keys make for verbose and slow foreign key joins. int/bigint primary keys are optimal as foreign keys when a table gets large.
Tables should always have created_at and updated_at metadata fields in them to make data movement between systems easier (ETL). Also, consider storing deleted records in archival tables, or having a deleted_at field for soft deletes.
Don't forget the needs of data analysts and ETL developers when designing your model.

Columns

Do not use reserved words for column names if possible.
Prefer not simply using id as the name of the primary identifier for the table if possible.
Do not add a column with the same name as its table and vice versa.
Avoid common words like name, description, etc. Prefer a descriptive prefix for those words so that they don't require aliases when joined to other tables with similarly named columns.
Do not use desc as an abbreviation for description. Spell it out, or use some other non-keyword.

Aliases

Aliases should relate in some way to the object or expression they are aliasing.
As a rule of thumb the alias can be the first letter of each word in the object's name or a good abbreviation.
If there is already an alias with the same name then append a number.
When using a subquery, prefix aliases with an _ to differentiate them from aliases in the outer query.
Always include the as keyword. It makes the query easier to read and is explicit.
For computed data (i.e. sum() or avg()) use the name you would give it were it a column defined in the schema.

Whitespace

No tabs. Use spaces for indents.
Configure your editor to 4 spaces per indent, but prefer your SQL to indent to the "river", and not to a set indent increment.
No trailing whitespace.
No more than two blank lines between statements.
No empty lines in the middle of a single statement.
One final newline at the end of a file
Use an .editorConfig file to enforce reasonable whitespace rules if your SQL editor supports it:

# .editorConfig is awesome: https://EditorConfig.org

# SQL files
[*.{sql,tsql,ddl}]
charset = utf-8
indent_style = space
indent_size = 4
end_of_line = crlf
trim_trailing_whitespace = true
insert_final_newline = true

River formatting

Spaces may be used to line up the code so that the root keywords all end on the same character boundary. This forms a "river" down the middle making it easy for the reader's eye to scan over the code and separate the keywords from the implementation detail. Rivers are bad in typography, but helpful here. Celko's book describes using a river to vertically align your query. Right align keywords to the river if you chose to use one. The on clause in the from may have its own river to help align information vertically. Subqueries should create their own river as well.

-- a river in the 7th column helps vertical readability
select q.name as question_name
     , q.elo_rank
     , q.question_type
     , qc.name as category_name
     , s.name as section_name
  from questions as q
  left join question_categories as qc
    on q.question_category_id = qc.id
  left join sections as s
    on qc.section_id = s.id
 where q.elo_rank <= 1500
   and q.id not in (
           select _a.question_id
             from answers _a
            where _a.created_at < now() - interval '1 year'
       )
   and q.question_type in ('SMC', 'MMC', 'TF')
 order by q.elo_rank desc, q.name

-- alternately, a river in the a different column is fine if that is preferred
-- due to longer keywords, but know that indenting can feel "off" if the
-- `select` is not in the first column for the query
   select q.name as question_name
        , q.elo_rank
        , q.question_type
        , qc.name as category_name
        , s.name as section_name
     from questions as q
left join question_categories as qc
       on q.question_category_id = qc.id
left join sections as s
       on qc.section_id = s.id
    where q.elo_rank <= 1500
      and q.id not in (
              select _a.question_id
                from answers _a
               where _a.created_at < now() - interval '1 year'
          )
      and q.question_type in ('SMC', 'MMC', 'TF')
 order by q.elo_rank desc, q.name

Indent formatting

Using a river can be tedious, so if this alignment is not preferred by your team, then a standard 4 space indent can be used in place of a river.

Major keywords starting a clause should occupying their own line. Major keywords are:

Select statement
- select
- into
- from
- where
- group by
- having
- order by
Insert statement additions
- insert into
- values
Update statement additions
- update
- set
Delete statement additions
- delete

All other keywords are minor and should appear after the indent and not occupy a line to themselves. Other than this section, this guide will stick to showing "river" formatting examples.

-- Editors tend to handle indenting style better than river alignment. River
-- formatting has advantages over indent formatting, but this style is
-- acceptable.
select
    q.name as question_name
    ,q.elo_rank
    ,q.question_type
    ,qc.name as category_name
    ,s.name as section_name
from
    questions as q
    left join question_categories as qc
        on q.question_category_id = qc.id
    left join sections as s
        on qc.section_id = s.id
where
    q.elo_rank <= 1500
    and q.question_type in ('SMC', 'MMC', 'TF')
order by
    q.elo_rank desc, q.name

`select` clause

Select the first column on the same line, and align all subsequent columns after the first get their own line.

select q.question_type
     , qc.name as category_name
     , count(*) as question_count
  from ...

If three or fewer columns are selected, have short names, and don't need aliased, you may chose to have them occupy the same line for brevity.

-- shortcut for small columns
select q.question_type, qc.name, q.elo_rank
  from ...

If using a select modifier like distinct or top, put the first column on its own line.

-- treat the first column differently when using distinct and top
select distinct
       u.email
     , e.state as enrollment_state
  from ...

Use commas as a prefix as opposed to a suffix. This is preferred because:

It makes it easy to add new columns to the end of the column list, which is more common than at the beginning
It prevents unintentional aliasing bugs (missing comma)
It makes commenting out columns at the end easier
When statements take multiple lines like windowing functions and case statements, the prefix comma makes it clear when a new column starts
It does not adversely affect readability

The comma should border the "river" on the keyword side.

GOOD:

select Name
     , ListPrice
     , Color
     , CategoryName
   ...

BAD:

-- whoops! forgot a trailing comma because it's hard to see, making an
-- accidental alias of `ListPrice Color`
select Name,
       ListPrice
       Color,
       CategoryName
   ...

Always use as to rename columns. as statements can be used for additional vertical alignment but don't have to be:

GOOD:

select q.question_type as question_type
     , qc.name         as category_name
     , count(*)        as question_count
  from ...
...

BAD:

select q.question_type question_type
     , qc.name category_name
     , count(*) question_count
  from ...
...

Always rename aggregates, derived columns (e.g. case statements), and function-wrapped columns:

select q.name
     , avg(a.time_taken) as avg_time_taken
     , now() as current_timestamp
  from ...

Always use table alias prefixes for all columns when querying from more than one table. Single character aliases are fine for a few tables, but are less likely to be clear as a query grows:

select q.question_type
     , qc.name as category_name
     , count(*) as question_count
  from questions as q
  left join question_categories as qc
    on ...

Do not bracket-escape table or column names unless the names contain keyword collisions or would cause a syntax error without properly qualifying them.

GOOD:

-- user is a keyword in some contexts
select name
     , email
     , state
from users

BAD:

-- extra quotes are messy and unnecessary
select "name"
     , "email"
     , "state"
from "users"

Windowing functions

Long Window functions should be split across multiple lines: one for each clause, aligned with a river. Partition keys can share the same line, or be split. Ascending order is an intuitive default and thus using an explicit asc is not necessary whereas desc is. All window functions should be aliased.

select q.id
     , q.name as question_name
     , q.question_number
     , q.question_type
     , row_number() over (partition by q.question_type
                                     , q.question_category_id
                              order by q.elo_rank desc) as rank_within_category
     , q.elo_rank
  from questions q
 order by q.question_type
     , q.question_category_id
     , rank_within_category

`case` statements

case statements aren't always easy to format but try to align when, then, and else together inside case and end.

then can stay on the when line if needed, but aligning with else is preferable.

select cp.title as course_title
     , case when cp.state in ('published', 'active')
            then 'available'
            else cp.state
       end as availability_status
  from content_packages as cp
 order by availability_status, course_title

`from` clause

Only one table should be in the from part. Never use comma separated from-joins:

GOOD:

select u.email
     , cp.title as course_title
  from users as u
  join enrollments as e
    on u.id = e.user_id
  join content_packages as cp
    on e.content_package_id = cp.id
...

BAD:

select u.email
     , cp.title as course_title
  from users as u, enrollments as e, content_packages as cp
 where u.id = e.user_id
   and e.content_package_id = cp.id
...

Favor not using the extraneous words inner or outer when joining tables. Alignment is easier without them, they don't add to the understanding of the query, and the full table list is easier to scan without excessive staggering:

GOOD:

-- this is easier to format and read
   select *
     from users as u
     join enrollments as e
       on u.id = e.user_id
left join content_packages as cp
       on e.content_package_id = cp.id
left join exam_results as er
       on e.id = er.content_package_id

BAD:

-- verbosity for the sake of verbosity is not helpful
-- `join` by itself always means `inner join`
-- `outer` is an unnecessary optional keyword
         select *
           from users as u
     inner join enrollments as e
             on u.id = e.user_id
left outer join content_packages as cp
             on e.content_package_id = cp.id
left outer join exam_results as er
             on e.id = er.content_package_id

The on keyword and condition can go on its own line, but is easier to scan if it lines up on the join line. This is an acceptable style alternative:

-- this is an easier format to scan visually, but comes at the cost of longer
-- lines of code.
   select *
     from users as u
     join enrollments as e        on u.id = e.user_id
left join content_packages as cp on e.content_package_id = cp.id
left join exam_results as er     on e.id = er.content_package_id
...

Additional filters in the join go on new indented lines. Line up using the on keyword:

GOOD:

   select u.name
     from users as u
left join enrollments as e
       on u.id = e.user_id
left join content_packages as cp
       on e.content_package_id = cp.id
      and cp.state = 'published'  -- multi-conditions start a new line
    where e.id is null

BAD:

   select u.name
     from users as u
left join enrollments as e
       on u.id = e.user_id
left join content_packages as cp
       on e.content_package_id = cp.id and cp.state = 'published'  -- needs a new line
    where e.id is null

Begin with inner joins and then list left joins, order them semantically, and do not intermingle left joins with inner joins unless necessary. Order the on clause with joining aliases referencing tables top-to-bottom:

GOOD:

select *
  from questions as q
  join question_categories as qc
    on q.question_category_id = qc.id
  left join sections as s
    on qc.section_id = s.id
  left join answers as a
    on q.id = a.question_id

BAD:

select *
  from questions as q
  left join sections as s
    on s.id = qc.section_id  -- backwards
  join question_categories as qc                       -- intermingled
    on qc.id = q.question_category_id              -- backwards
  left join answers as a
    on q.id = a.question_id

Avoid right joins as they are usually better written with a left join

GOOD:

select *
  from questions as q
  left join question_categories as qc
    on ...

BAD:

select *
  from question_categories as qc
 right join questions as q
    on ...

`where` clause

Multiple where clauses should go on different lines and align to the river:

select *
  from questions q
 where q.elo_rank > 1200
   and q.elo_rank < 1800
   and question_type in ('SMC', 'MMC', 'TF')
   and created_at >= '2020-01-01'
...

When mixing and and or statements, do not rely on order of operations and instead always use parenthesis to make the intent clear:

select *
  from questions q
 where (q.elo_rank > 1500
   and question_type in ('SMC', 'MMC'))
    or question_type is null

Always put a semicolon on its own line when using them. This prevents common errors like adding conditions to a where clause and neglecting to move the trailing semicolon:

GOOD:

-- The prefix semicolon is clear and easy to spot when adding to a `where`
delete a
  from answers a
 where a.deleted_at is not null
   and a.correct is null
   and a.user_answer is null
;
...

BAD:

-- A trailing semicolon is sinister.
-- We added some where conditions and missed it.
-- This is a destructive bug.
delete a
  from answers a
 where a.deleted_at is not null;  -- dangerous
   and a.correct is null      -- syntax error here, but the bad delete is valid
   and a.user_answer is null
...

`group by` clause

Maintain the same column order as the select clause in the group by:

GOOD:

  select e.user_id
       , e.content_package_id
       , count(*) as exam_count
       , avg(er.percentage_correct) as avg_score
    from enrollments as e
    join exam_results as er
      on e.user_id = er.user_id
group by e.user_id
       , e.content_package_id

BAD:

-- messing with the 'group by' order makes it hard to scan for accuracy
  select e.user_id
       , e.content_package_id
       , count(*) as exam_count
       , avg(er.percentage_correct) as avg_score
    from enrollments as e
    join exam_results as er
      on e.user_id = er.user_id
group by e.content_package_id  -- out of order
       , e.user_id

`having` clause

A having clause is just a where clause for aggregate functions. The same rules for where clauses apply to having.

Example:

  select e.user_id
       , e.content_package_id
       , count(*) as exam_count
       , avg(er.percentage_correct) as avg_score
    from enrollments as e
    join exam_results as er
      on e.user_id = er.user_id
group by e.user_id
       , e.content_package_id
  having count(*) > 1
     and avg(er.percentage_correct) > 70.0

`order by` clause

Do not use the superfluous asc in order by statements:

GOOD:

-- asc is implied and obvious
  select u.name
       , u.email
    from users u
order by u.name
       , u.email

BAD:

-- asc is clutter - it's never ambiguous when you wanted to sort ascending
  select u.name
       , u.email
    from users u
order by u.name asc  -- useless asc
       , u.email asc

Ordering by column number is okay, but not preferred:

-- This is okay, but not great.
  select u.name as user_name
       , u.email as user_email
    from users u
order by 2

The by keyword can sit on the other side of a 7th column river, but align the order by columns:

select u.name
     , u.email
  from users u
 order by u.name
        , u.email

If three or fewer columns are in the order by and have short names you may chose to have them occupy the same line for brevity.

-- shortcut for small columns
select u.name, u.email
  from users u
 order by u.name, u.email

tym-xqo/benchprep-sql-style-guide.md

Select an option

No results found

Select an option

No results found

BenchPrep SQL Style Guide

Purpose

Principles

Quick look

Rules

General guidance

SQL Keyword Casing

SQL Object Name Casing

Naming guidance

Tables

Columns

Aliases

Whitespace

River formatting

Indent formatting

`select` clause

Windowing functions

`case` statements

`from` clause

`where` clause

`group by` clause

`having` clause

`order by` clause

tym-xqo/benchprep-sql-style-guide.md

BenchPrep SQL Style Guide

Purpose

Principles

Quick look

Rules

General guidance

SQL Keyword Casing

SQL Object Name Casing

Naming guidance

Tables

Columns

Aliases

Whitespace

River formatting

Indent formatting

select clause

Windowing functions

case statements

from clause

where clause

group by clause

having clause

order by clause

`select` clause

`case` statements

`from` clause

`where` clause

`group by` clause

`having` clause

`order by` clause