layout | author | title | revision | version | description |
---|---|---|---|---|---|
default |
mattmc3 |
Modern SQL Style Guide |
2019-01-17 |
1.0.1 |
A guide to writing clean, clear, and consistent SQL. |
select *
from modern.sql_style_guide as guide
where guide.attributes in ('clean', 'clear', 'consistent')
and guide.look = 'beautiful'
These guidelines are designed to make SQL statements easy to write, easy to read, easy to maintain, and beautiful to see. This document is to be used as a guide for anyone who would like to codify a team's preferred SQL style.
This guide is opinionated in some areas and relaxed in others. You can use this set of guidelines, fork them, or make your own - the key here is that you pick a style and stick to it. The odds of making everyone happy are low, so compromise is a guiding principle to achieve cohesion.
It is easy to include this guide in Markdown format as a part of a project's code base or reference it here for anyone on the project to freely read.
This guide is based on various existing attempts at SQL standards including: http://www.sqlstyle.guide and Kickstarter guide. Due to its origins, it is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The example SQL statements used are based on tables in the AdventureWorks database. Note that due to the use of the existing AdventureWorks schema, some of the guidelines in this document are not always followed, especially with regards to naming conventions. Those discrepancies will be called out as they appear.
NOTE: This style guide is written for use with Microsoft SQL Server, but much of it can be applied to any SQL database with some simple modifications.
- We take a disciplined and practical approach to writing code.
- We treat SQL like any other source code, which should be checked into source control, peer reviewed, and properly maintained.
- We believe consistency in style is important, and we value craftsmanship, but not to the exclusion of other practical concerns.
- We demonstrate intent explicitly in code, via clear structure and comments where needed.
- We adhere to a consistent style for handwritten SQL so that our code can thrive in an environment with many authors, editors, and readers.
Before getting into all the specifics, here is a quick look at some examples showing well formatted, beautiful SQL that matches the recommendations in this style guide:
-- basic select example
select p.Name as ProductName
, p.ProductNumber
, pm.Name as ProductModelName
, p.Color
, p.ListPrice
from Production.Product as p
join Production.ProductModel as pm
on p.ProductModelID = pm.ProductModelID
where p.Color in ('Blue', 'Red')
and p.ListPrice < 800.00
and pm.Name like '%frame%'
order by p.Name
-- basic insert example
insert into Sales.Currency (CurrencyCode, Name, ModifiedDate)
values ('XBT', 'Bitcoin', getutcdate())
, ('ETH', 'Ethereum', getutcdate())
-- basic update example
update p
set p.ListPrice = p.ListPrice * 1.05
, p.ModifiedDate = getutcdate()
from Production.Product as p
where p.SellEndDate is null
and p.SellStartDate is not null
-- basic delete example
delete cc
from Sales.CreditCard as cc
where cc.ExpYear < '2003'
and cc.ModifiedDate < dateadd(year, -1, getutcdate())
-
Favor using a "river" for vertical alignment so that a query can be quickly and easily be scanned by a new reader.
-
Comments should appear at the top of your query or script, and should explain the intent of the query, not the mechanics.
-
Try to comment things that aren't obvious about the query (e.g., why a particular filter is necessary, why an optimization trick was needed, etc.)
-
Favor being descriptive over terseness:
GOOD:
select emp.LoginID as EmployeeUserName
BAD:
select emp.LoginID as EmpUsrNm
-
Follow any existing style in the script before applying this style guide. The SQL script should have one clear style, and these rules should not be applied to existing scripts unless the whole script is being changed to adhere to the same style.
-
Favor storing
datetime
anddatetime2
in UTC unless embedding timezone information (datetimeoffset
) so that times are clear and convertible. Use ISO-8601 compliant time and date information (YYYY-MM-DD HH:MM:SS.SSSSS
) when referring to date/time data.
Do not SHOUTCASE or "Sentence case" SQL keywords (e.g., prefer select
, not
SELECT
or Select
). SHOUTCASED SQL is an anachronism, and is not appropriate
for modern SQL development. Using lowercase keywords is preferred because:
- UPPERCASE words are harder to type and harder to read.
- SQL syntax is not case-sensitive, and thus lowercase keywords work correctly in all variants of SQL
- No other modern languages use ALLCAPS keywords.
- Modern editors color code SQL keywords, so there is not a need to distinguish keywords by casing.
- If you are in an environment where your keywords are not colored (i.e. as a string in another language), using a river for formatting provides a similar benefit of highlighting important keywords without resorting to CAPS.
- UPPERCASE IS ASSOCIATED WITH SHOUTING WHEN SEEN IN TEXT, IS HARD TO READ, AND MAKES SQL FEEL MORE LIKE COBOL THAN A MODERN LANGUAGE.
If the SQL script you are editing already uses SHOUTCASE keywords, match that style or change all keywords to lowercase. Favor bending the rules for the sake of consistency rather than mixing styles.
-
Names should be
underscore_separated
orPascalCase
but do not mix styles.GOOD:
select count(*) as the_tally, sum(*) as the_total ...
BAD:
select count(*) as The_Tally, sum(*) as theTotal ...
-
Do not use reserved words for table names if possible.
-
Prefer the shortest commonly understood words to name a table.
-
Naming a table as a plural makes the table easier to speak about. (e.g. favor
employees
overemployee
) -
Do not use object prefixes or Hungarian notation (e.g.
sp_
,prc_
,vw_
,tbl_
,t_
,fn_
, etc). -
Tables with semantic prefixes are okay if they aid understanding the nature of a table (e.g. in a Data Warehouse where it is common to use prefixes like
Dim
andFact
). -
Avoid giving a table the same name as one of its columns.
-
Use a joining word for many-to-many joining tables (cross references) rather than concatenating table names (e.g.
Xref
):GOOD:
drivers_xref_cars
BAD:
drivers_cars
-
Tables should always have a primary key. A single column, auto-number (identity) surrogate key is preferable.
-
Natural keys or composite keys can be enforced with unique constraints in lieu of making them a primary key.
-
Composite keys make for verbose and slow foreign key joins.
int
/bigint
primary keys are optimal as foreign keys when a table gets large. -
Tables should always have
created_at
andupdated_at
metadata fields in them to make data movement between systems easier (ETL). Also, consider storing deleted records in archival tables, or having adeleted_at
field for soft deletes. -
Don't forget the needs of data analysts and ETL developers when designing your model.
- Do not use reserved words for column names if possible.
- Prefer not simply using
id
as the name of the primary identifier for the table if possible. - Do not add a column with the same name as its table and vice versa.
- Avoid common words like
Name
,Description
, etc. Prefer a descriptive prefix for those words so that they don't require aliases when joined to other tables with similarly named columns. (NOTE: This guide uses the AdventureWorks database, which commonly has columns namedName
against this guide's advice. Remember that an existing convention may be in place that is beyond your control. ) - Do not use
Desc
as an abbreviation forDescription
. Spell it out, or use some other non-keyword.
- Aliases should relate in some way to the object or expression they are aliasing.
- As a rule of thumb the alias can be the first letter of each word in the object's name or a good abbreviation.
- If there is already an alias with the same name then append a number.
- When using a subquery, prefix aliases with an
_
to differentiate them from aliases in the outer query. - Always include the
as
keyword. It makes the query easier to read and is explicit. - For computed data (i.e.
sum()
oravg()
) use the name you would give it were it a column defined in the schema.
- No tabs. Use spaces for indents.
- Configure your editor to 4 spaces per indent, but prefer your SQL to indent to the "river", and not to a set indent increment.
- No trailing whitespace.
- No more than two blank lines between statements.
- No empty lines in the middle of a single statement.
- One final newline at the end of a file
- Use an .editorConfig file to enforce reasonable whitespace rules if your SQL editor supports it:
# .editorConfig is awesome: https://EditorConfig.org
# SQL files
[*.{sql,tsql,ddl}]
charset = utf-8
indent_style = space
indent_size = 4
end_of_line = crlf
trim_trailing_whitespace = true
insert_final_newline = true
Spaces may be used to line up the code so that the root keywords all end on the
same character boundary. This forms a "river" down the middle making it easy for
the reader's eye to scan over the code and separate the keywords from the
implementation detail. Rivers are bad in typography, but helpful here.
Celko's book describes using a river to vertically align your query.
Right align keywords to the river if you chose to use one. The on
clause in
the from
may have its own river to help align information vertically.
Subqueries should create their own river as well.
-- a river in the 7th column helps vertical readability
select prdct.Name as ProductName
, prdct.ListPrice
, prdct.Color
, cat.Name as CategoryName
, subcat.Name as SubcategoryName
from Production.Product as prdct
left join Production.ProductSubcategory as subcat
on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID
left join Production.ProductCategory as cat
on subcat.ProductCategoryID = cat.ProductCategoryID
where prdct.ListPrice <= 1000.00
and prdct.ProductID not in (
select _pd.ProductID
from Production.ProductDocument _pd
where _pd.ModifiedDate < dateadd(year, -1, getutcdate())
)
and prdct.Color in ('Black', 'Red', 'Silver')
order by prdct.ListPrice desc, prdct.Name
-- alternately, a river in the a different column is fine if that is preferred
-- due to longer keywords, but know that indenting can feel "off" if the
-- `select` is not in the first column for the query
select prdct.Name as ProductName
, prdct.ListPrice
, prdct.Color
, cat.Name as CategoryName
, subcat.Name as SubcategoryName
from Production.Product as prdct
left join Production.ProductSubcategory as subcat
on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID
left join Production.ProductCategory as cat
on subcat.ProductCategoryID = cat.ProductCategoryID
where prdct.ListPrice <= 1000.00
and prdct.ProductID not in (
select _pd.ProductID
from Production.ProductDocument _pd
where _pd.ModifiedDate < dateadd(year, -1, getutcdate())
)
and prdct.Color in ('Black', 'Red', 'Silver')
order by prdct.ListPrice desc, prdct.Name
Using a river can be tedious, so if this alignment is not preferred by your team, then a standard 4 space indent can be used in place of a river.
Major keywords starting a clause should occupying their own line. Major keywords are:
- Select statement
select
into
from
where
group by
having
order by
- Insert statement additions
insert into
values
- Update statement additions
update
set
- Delete statement additions
delete
All other keywords are minor and should appear after the indent and not occupy a line to themselves. Other than this section, this guide will stick to showing "river" formatting examples.
-- Editors tend to handle indenting style better than river alignment. River
-- formatting has advantages over indent formatting, but this style is
-- acceptable.
select
prdct.Name as ProductName
,prdct.ListPrice
,prdct.Color
,cat.Name as CategoryName
,subcat.Name as SubcategoryName
from
Production.Product as prdct
left join Production.ProductSubcategory as subcat
on prdct.ProductSubcategoryID = subcat.ProductSubcategoryID
left join Production.ProductCategory as cat
on subcat.ProductCategoryID = cat.ProductCategoryID
where
prdct.ListPrice <= 1000.00
and prdct.Color in ('Black', 'Red', 'Silver')
order by
prdct.ListPrice desc, prdct.Name
Select the first column on the same line, and align all subsequent columns after the first get their own line.
select prdct.Color
, cat.Name as CategoryName
, count(*) as ProductCount
from ...
If three or fewer columns are selected, have short names, and don't need aliased, you may chose to have them occupy the same line for brevity.
-- shortcut for small columns
select p.Color, c.Name, p.ListPrice
from ...
If using a select
modifier like distinct
or top
, put the first column
on its own line.
-- treat the first column differently when using distinct and top
select distinct
p.Color
, c.Name as CategoryName
from ...
Use commas as a prefix as opposed to a suffix. This is preferred because:
- It makes it easy to add new columns to the end of the column list, which is more common than at the beginning
- It prevents unintentional aliasing bugs (missing comma)
- It makes commenting out columns at the end easier
- When statements take multiple lines like windowing functions and
case
statements, the prefix comma makes it clear when a new column starts - It does not adversely affect readability
The comma should border the "river" on the keyword side.
GOOD:
select Name
, ListPrice
, Color
, CategoryName
...
BAD:
-- whoops! forgot a trailing comma because it's hard to see, making an
-- accidental alias of `ListPrice Color`
select Name,
ListPrice
Color,
CategoryName
...
Always use as
to rename columns. as
statements can be used for additional
vertical alignment but don't have to be:
GOOD:
select prdct.Color as ProductColor
, cat.Name as CategoryName
, count(*) as ProductCount
from ...
...
BAD:
select prdct.Color ProductColor
, cat.Name CategoryName
, count(*) ProductCount
from ...
...
Always rename aggregates, derived columns (e.g. case
statements), and
function-wrapped columns:
select ProductName
, sum(UnitPrice * OrderQty) as TotalCost
, getutcdate() as NowUTC
from ...
Always use table alias prefixes for all columns when querying from more than one table. Single character aliases are fine for a few tables, but are less likely to be clear as a query grows:
select prdct.Color
, subcat.Name as SubcategoryName
, count(*) as ItemCount
from Production.Product as prdct
left join Production.ProductSubcategory as subcat
on ...
Do not bracket-escape table or column names unless the names contain keyword collisions or would cause a syntax error without properly qualifying them.
GOOD:
-- owner and status are keywords
select Title
, [Owner]
, [Status]
from Production.Document
BAD:
-- extra brackets are messy and unnecessary
select [Title]
, [Owner]
, [Status]
from [Production].[Document]
Long Window functions should be split across multiple lines: one for each
clause, aligned with a river. Partition keys can share the same line, or be
split. Ascending order is an intuitive default and thus using an explicit asc
is not necessary whereas desc
is. All window functions should be aliased.
select p.ProductID
, p.Name as ProductName
, p.ProductNumber
, p.ProductLine
, row_number() over (partition by p.ProductLine
, left(p.ProductNumber, 2)
order by right(p.ProductNumber, 4) desc) as SequenceNum
, p.Color
from Production.Product p
order by p.ProductLine
, left(p.ProductNumber, 2)
, SequenceNum
case
statements aren't always easy to format but try to align when
, then
,
and else
together inside case
and end
.
then
can stay on the when
line if needed, but aligning with else
is
preferable.
select dep.Name as DepartmentName
, case when dep.Name in ('Engineering', 'Tool Design', 'Information Services')
then 'Information Technology'
else dep.GroupName
end as NewGroupName
from HumanResources.Department as dep
order by NewGroupName, DepartmentName
Only one table should be in the from
part. Never use comma separated
from
-joins:
GOOD:
select cust.AccountNumber
, sto.Name as StoreName
from Sales.Customer as cust
join Sales.Store as sto
on cust.StoreID = sto.BusinessEntityID
...
BAD:
select cust.AccountNumber
, sto.Name as StoreName
from Sales.Customer as cust, Sales.Store as sto
where cust.StoreID = sto.BusinessEntityID
...
Favor not using the extraneous words inner
or outer
when joining tables.
Alignment is easier without them, they don't add to the understanding of the
query, and the full table list is easier to scan without excessive staggering:
GOOD:
-- this is easier to format and read
select *
from HumanResources.Employee as emp
join Person.Person as per
on emp.BusinessEntityID = per.BusinessEntityID
left join HumanResources.EmployeeDepartmentHistory as edh
on emp.BusinessEntityID = edh.BusinessEntityID
left join HumanResources.Department as dep
on edh.DepartmentID = dep.DepartmentID
BAD:
-- verbosity for the sake of verbosity is not helpful
-- `join` by itself always means `inner join`
-- `outer` is an unnecessary optional keyword
select *
from HumanResources.Employee as emp
inner join Person.Person as per
on emp.BusinessEntityID = per.BusinessEntityID
left outer join HumanResources.EmployeeDepartmentHistory as edh
on emp.BusinessEntityID = edh.BusinessEntityID
left outer join HumanResources.Department as dep
on edh.DepartmentID = dep.DepartmentID
The on
keyword and condition can go on its own line, but is easier to scan if
it lines up on the join
line. This is an acceptable style alternative:
-- this is an easier format to scan visually, but comes at the cost of longer
-- lines of code.
select *
from HumanResources.Employee as emp
join Person.Person as per on emp.BusinessEntityID = per.BusinessEntityID
left join HumanResources.EmployeeDepartmentHistory as edh on emp.BusinessEntityID = edh.BusinessEntityID
left join HumanResources.Department as dep on edh.DepartmentID = dep.DepartmentID
...
Additional filters in the join
go on new indented lines. Line up using the
on
keyword:
GOOD:
select emp.JobTitle
from HumanResources.Employee as emp
left join HumanResources.EmployeeDepartmentHistory as edh
on emp.BusinessEntityID = edh.BusinessEntityID
left join HumanResources.Department as dep
on edh.DepartmentID = dep.DepartmentID
and dep.Name <> dep.GroupName -- multi-conditions start a new line
where dep.DepartmentID is null
BAD:
select emp.JobTitle
from HumanResources.Employee as emp
left join HumanResources.EmployeeDepartmentHistory as edh
on emp.BusinessEntityID = edh.BusinessEntityID
left join HumanResources.Department as dep
on edh.DepartmentID = dep.DepartmentID and dep.Name <> dep.GroupName -- needs a new line
where dep.DepartmentID is null
Begin with inner join
s and then list left join
s, order them semantically,
and do not intermingle left join
s with inner join
s unless necessary. Order
the on
clause with joining aliases referencing tables top-to-bottom:
GOOD:
select *
from Production.Product as prd
join Production.ProductModel as prm
on prd.ProductModelID = prm.ProductModelID
left join Production.ProductSubcategory as psc
on prd.ProductSubcategoryID = psc.ProductSubcategoryID
left join Production.ProductDocument as doc
on prd.ProductID = doc.ProductID
BAD:
select *
from Production.Product as prd
left join Production.ProductSubcategory as psc
on psc.ProductSubcategoryID = prd.ProductSubcategoryID -- backwards
join Production.ProductModel as prm -- intermingled
on prm.ProductModelID = prd.ProductModelID -- backwards
left join Production.ProductDocument as doc
on prd.ProductID = doc.ProductID
Avoid right joins
as they are usually better written with a left join
GOOD:
select *
from Production.Product as prd
left join Production.ProductSubcategory as psc
on ...
BAD:
select *
from Production.ProductSubcategory as psc
right join Production.Product as prd
on ...
Multiple where
clauses should go on different lines and align to the river:
select *
from Production.Product prd
where prd.Weight > 2.5
and prd.ListPrice < 1500.00
and Color in ('Blue', 'Black', 'Red')
and SellStartDate >= '2006-01-01'
...
When mixing and
and or
statements, do not rely on order of operations and
instead always use parenthesis to make the intent clear:
select *
from Production.Product prd
where (prd.Weight > 10.0
and Color in ('Red', 'Silver'))
or Color is null
Always put a semicolon on its own line when using them. This prevents common
errors like adding conditions to a where
clause and neglecting to move the
trailing semicolon:
GOOD:
-- The prefix semicolon is clear and easy to spot when adding to a `where`
delete prd
from Production.Product prd
where prd.ListPrice = 0
and weight is null
and size is null
;
...
BAD:
-- A trailing semicolon is sinister.
-- We added some where conditions and missed it.
-- This is a destructive bug.
delete prd
from Production.Product prd
where prd.ListPrice = 0; -- dangerous
and weight is null -- syntax error here, but the bad delete is valid
and size is null
...
Maintain the same column order as the select
clause in the group by
:
GOOD:
select poh.EmployeeID
, poh.VendorID
, count(*) as OrderCount
, avg(poh.SubTotal) as AvgSubTotal
from Purchasing.PurchaseOrderHeader as poh
group by poh.EmployeeID
, poh.VendorID
BAD:
-- messing with the 'group by' order makes it hard to scan for accuracy
select poh.EmployeeID
, poh.VendorID
, count(*) as OrderCount
, avg(poh.SubTotal) as AvgSubTotal
from Purchasing.PurchaseOrderHeader as poh
group by poh.VendorID -- out of order
, poh.EmployeeID
A having
clause is just a where
clause for aggregate functions. The same
rules for where
clauses apply to having
.
Example:
select poh.EmployeeID
, poh.VendorID
, count(*) as OrderCount
, avg(poh.SubTotal) as AvgSubTotal
from Purchasing.PurchaseOrderHeader as poh
group by poh.EmployeeID
, poh.VendorID
having count(*) > 1
and avg(poh.SubTotal) > 3000.00
Do not use the superfluous asc
in order by
statements:
GOOD:
-- asc is implied and obvious
select per.LastName
, per.FirstName
from Person.Person per
order by per.LastName
, per.FirstName
BAD:
-- asc is clutter - it's never ambiguous when you wanted to sort ascending
select per.LastName
, per.FirstName
from Person.Person per
order by per.LastName asc -- useless asc
, per.FirstName asc
Ordering by column number is okay, but not preferred:
-- This is okay, but not great.
select per.FirstName + ' ' + per.LastName as FullName
, per.LastName + ', ' + per.FirstName as LastFirst
from Person.Person per
order by 2
The by
keyword can sit on the other side of a 7th column river, but align
the order by columns:
select per.FirstName
, per.LastName
from Person.Person per
order by per.LastName
, per.FirstName
If three or fewer columns are in the order by
and have short names you may
chose to have them occupy the same line for brevity.
-- shortcut for small columns
select per.FirstName, per.LastName
from Person.Person per
order by per.LastName, per.FirstName