SQL Antipattern

1. Jaywalk

Purpose

store attribute that has multiple values

CREATE TABLE Products (
-- ...
account_id VARCHAR(100),
-- ...
)

Anti-Pattern

We should not store account_id separated by comma
- hard to query, update, maintain

Solution

Use intersection table to inverse the dependencies

2. Naive Tree

Purpose

Store tree structure

Anti-Pattern

Depends on Parent
- child holds parent id
- ex.) get all childs for the parent: need to issue JOIN for every traverse
- hard to maintain (specifically deletion the node)

Solution

Path Enumeration
- store path (ex. 1/2/3/ 1/4/6/7/) information
Nested Set

Closure Table

Create additional table to store tree structure

CREATE TABLE TreePaths (
    ancestor BIGINT,
    descendant BIGINT,
    PRIMARY KEY (ancestor, descendant),
    FOREIGN KEY (ancestor) REFERENCES Comments(comment_id),
    FOREIGN KEY (descendant) REFERENCES Comments(comment_id),
);

Use GraphQL might be much more faster?

3. ID required

Purpose

How to define primary key

Anti-Pattern

Add id column to all tables
- it causes creation of redundunt key
- it might allows undesired duplicate rows
- hard to understand the meaning of key

Solution

Use the column name that's easy to understand
- id => comment_id, user_id, product_id
compound key if needed

4. Keyless entry

Purpose

Simplify database architecture

Anti-Pattern

Not using foreign key
- to Update/Delete the row that refers other table column, need to delete the data in the dependent table beforehand

Solution

Use foreign key

Able to do Cascade Update

CREATE TABLE Bugs (
  reported_by BIGINT,
  status VARCHAR(20) NOT NULL,
  FOREIGN KEY(reported_by) REFERENCES Accounts(account_id) 
      ON UPDATE CASCADE
      ON DELETE RESTRICT,
  FOREIGN KEY (status) REFERENCES BugStatus(status) 
      ON UPDATE CASCADE 
      ON DELETE SET DEFAULT
);

5. EAV

Purpose

Support mutable attribute

Anti-Pattern

Create flexible attribute tabled

what's wrong with above architecture:
- unable to set required attribute
- all the attribute data type become same
  - for instance, reported_date, version should be defined as date, float etc, but unable to define like that cause all attributes should have the same datatype (most likely VARCHAR etc)
  - we need to handle the datatype is valid on application side

Solution

If the mutable data structure is needed, let's use NoSQL etc.
Single Table inheritance
- create Table that has all the attributes as a column
  - cons: some rows might have NULL values for some attributes.
Concrete Table inheritance
- follow STi, but create table for each subclasses
  - ex.) BugsAttributes, FeatureRequestAttributes etc

6. Polymorphic relationship

Purpose

Refer multiple parent tables

Anti-Pattern

don't use foreign key

CREATE TABLE Comments (
    comment_id SERIAL PRIMARY KEY,
    issue_type VARCHAR(20),
    -- holds Bugs or FeatureRequests
    issue_id BIGINT
);

if we define the Comment table like above, we cannot use foreign key cause it refers two tables (Bugs or FeatureRequests).
and also, if we query the comments from issue_id, need to JOIN two parent tables
zombie comments (comments that refers non-exist rows) might exist cause it cannot use foreign key.

Solution

Define intersection table

Alternatively, we can use Common Parent table like this:

7. Multi Column Attributes

Purpose

define attributes that has multiple values
- ex.) issue labels
  - it may has release version, severity, assignee, ETA etc

Anti-Pattern

define multiple columns

CREATE TABLE Bugs (
    bug_id SERIAL PRIMARY KEY,
    details VARCHAR(200),
    -- holds Bugs or FeatureRequests
    labels1 VARCHAR(20),
    labels2 VARCHAR(20),
    labels3 VARCHAR(20),
);

in above case, we can holds up to 3 labels
- if the issue has only 1 or 2 labels, means some values become NULL
what's the problem?
- to search the row that has "performance" label
  - need to query like WHERE label1 = 'performance' OR ...
- same as above, update and delete will be bothersome too
- some labels could have the same value
- it'd be bothersome too when we increase the number of available label (ex. label4, label5)

Solution

Create Dependent Table that has label values

CREATE TABLE Labels (
    label_id SERIAL PRIMARY KEY,
    label VARCHAR(20),
    -- holds Bugs or FeatureRequests
    PRIMARY KEY (label_id, label)
    FOREIGN KEY (label_id) REFERENCES Labels(label_id)
);

8. Metadata Tribble

Purpose

Increase Scalability in case of the Table becomes huge

Anti-Pattern

Split Table
- Tables like: Bugs_2008, Bugs_2009 etc
what's the problems?
- need to have some tricky logics for the boundary values
  - ex.) register row to Bugs_2010 cause the date of data is Jan 3.
  - later realized the date of the data was 2009 Dec
  - in that case, we need to delete dat in Bugs_2010 and copy to Bugs_2009
- hard to query among multiple tables

Solution

Partitioning table
- Horizonal Partitioning:
```
CREATE TABLE Bugs (bug_id SERIAL PRIMARY KEY, -- ...
) PARTITION BY HASH (YEAR (date_reported)) PARTITIONS 4;
```
- Virtical Partitioning:
  - split table by column
  - it's beneficial when we use huge datatype like BLOB, TEXT etc
Introduce Dependent Table

9. Rounding Error

Purpose

Use float values

Anti-Pattern

Using FLOAT data type
what's wrong?
- float is not an actual value, is a value approximated by the power of 2

Solution

Using NUMERIC or DECIMAL

10. 31 flavours

Purpose

limit column values to specific values
- ex.) let status column can holds only OPEN, RESOLVED, CLOSED etc.

Anti-Pattern

Define limited value by column definition

CREATE TABLE Bugs (
    bug_id SERIAL PRIMARY KEY,
    status VARCHAR(20) CHECK (status IN ('NEW', 'IN PROGRESS', "RESOLVED"))
);

what's the problems?
- Unable to know the available values on the application side
- to delete the allowed value, need to do:
  - remove constraint
  - update target value
  - apply new constraint

Solution

Creata Mater table that holds the alloed value

11. Phantom file

Purpose

Store media files (ex. pics, videos)

Anti-Pattern

Solution

We can use two ways based on the application situations:

Using BLOB
store filepath of the actual files
- hard to maintain transaction, rollback etc.
alternatively, some database supports the logic that can handle the external files transparently
- Oracle: BFILE
- SQL SERVER 2008: FILESTREAM

12. INDEX Shotgun

Purpose

Increase the performance

Anti-Pattern

Applying the index without analysis
- if we apply the index too much, DB needs to maintain the index on each insertion
- if we apply a few index, query costs would be huge

Solution

Analysis
- check slow query by analysis tools (ex. pt-query-digest)
- check the index is used or not by EXPLAIN the actual queries

13. Fear of UNKNOWN

Purpose

Allow the row that has lacked columns
- ex.) for storing the information of employee, some has middle name but others not

Anti-Pattern

Use NULL as a general value
what's the problem
- WHERE NOT cannot fetch NULL values
- Unable to use prepared statement for NULL value
Avoid Using NULL and use alternative values that means NULL
- -1, "" etc
- if we use alternative values, means the application need to have above domain knowledges

Solution

not use NULL value as much as possible
if impossible, use NULL in a scalar formulas
- TRUE and NULL = NULL
- FALSE and NULL = FALSE
- TRUE or NULL = TRUE
- FALSE or NULL = NULL
Use COALESCE to return non-NULL if the value is NULL

14. ambiguous group

Purpose

get the row that has the maximum value in the group

Anti-Pattern

refer non-grouped value

SELECT product_id, MAX(date_reported) AS latest, bug_id
FROM Bugs GROUP BY product_id;

what's the problem:
- product_id , date_reported are correct, for these columns, the value would be the unique
- buf for bug_id, it's out of scope of GROUP, means it may return ambiguous values
  - the behaviour differs for each database, for MySQL and SQLite, it'll returns the ambiguous values, others return error in general.
- when using GROUP BY, we need to follow Single Value Rule.

Solution

Use subquery
Use derived table

15. Random selection

Purpose

fetch random row

Anti-Pattern

ORDER BY RAND() LIMIT n
- performance down, cause DB cannot utilize index fo this randomized sort

Solution

Specify random value between the smallest and largest
- CEIL(RAND() * (SELECT MAX(id) From Bugs)) AS rand_id
Get all ids firstly, then specify the random id from the result

16. Poorman's searchengine

Purpose

Full text search

Anti-Pattern

Using pattern match statement
- LIKE %crash%
what's the problem
- cannot use index for that search

Solution

Using thirdparty searchengine like Sphinx, Lucene, Elasticsearch
use MyISAM FULLTEXT INDEX
- available for CHAR, VARCHAR, TEXT only
SQL Server also supports full text index
PostgreSQL has TSVECTOR type that supports full text search
SQLite supports FTS

17. Spagetti query

Purpose

Decrease the number of query

Anti-Pattern

Forcibly resolve the problem with one query
- SELECT, JOIN, JOIN, JOIN...
- it may leads to undesired Cartesian Products

Solution

use UNION statement to combine the each qeury result

SELECT p.product_id, 'FIXED' as status, COUNT(p.id) AS bug_count FROM Bugs
UNION ALL
SELECT p.product_id, 'OPEN' as status,COUNT(p.id) AS bug_count FROM Bugs

If possible, better to divide one huge query to multiple queries and do divide and conquer for more maintainability

18. Implicit column

Purpose

Reduce the size of query

Anti-Pattern

Using asterisk
what's wrong
- if you use JOIN and both tables has the same name column, it may leads to undesired error
  - JOIN Book table and Author table, get $row["title"] => if both Book and Author table has "title" column, you'll get untrusted results
- to reduce the consumption of NW bandwidth, better not to fetch the non-necessary data

Solution

specify the required column name explicitly, instead of using asterisk

19. Readable password

Purpose

Storing password into the database

Anti-Pattern

storing password as a plain text
- HUGE SECURITY RISK

Solution

Storing Hash with Solt

20. SQL Injection

Purpose

Anti-Pattern

Solution

Prepared statement
- passing the variables when preparing the query
filtering the query on the application side

21. Pseudokey neat freak

don't fill the lacked value,
- ex.) the lacked row of auto-increment column
- reusing the pseudo key may leads to undesired error
- we can use UUID alternatively

22. See no evil

// omit

23. Diplomatic immunity

let's create documentation and required information to make DB much more durable, and for preparing unexpected situations
- ER diagrams
- DDL
- infrastructure (conf, logging)
- NW
- seed data
- PIT backup
- unittest for DB(table, column exist or not, check constraint, check ORM, check seed data)

24. Magic Beans

don't make DAO =~ Model in MVC context

let models hold domain knowledge and decouping with DAO

kubosuke/SQL-Antipattern.md

SQL Antipattern

1. Jaywalk

Purpose

Anti-Pattern

Solution

2. Naive Tree

Purpose

Anti-Pattern

Solution

3. ID required

Purpose

Anti-Pattern

Solution

4. Keyless entry

Purpose

Anti-Pattern

Solution

5. EAV

Purpose

Anti-Pattern

Solution

6. Polymorphic relationship

Purpose

Anti-Pattern

Solution

7. Multi Column Attributes

Purpose

Anti-Pattern

Solution

8. Metadata Tribble

Purpose

Anti-Pattern

Solution

9. Rounding Error

Purpose

Anti-Pattern

Solution

10. 31 flavours

Purpose

Anti-Pattern

Solution

11. Phantom file

Purpose

Anti-Pattern

Solution

12. INDEX Shotgun

Purpose

Anti-Pattern

Solution

13. Fear of UNKNOWN

Purpose

Anti-Pattern

Solution

14. ambiguous group

Purpose

Anti-Pattern

Solution

15. Random selection

Purpose

Anti-Pattern

Solution

16. Poorman's searchengine

Purpose

Anti-Pattern

Solution

17. Spagetti query

Purpose

Anti-Pattern

Solution

18. Implicit column

Purpose

Anti-Pattern

Solution

19. Readable password

Purpose

Anti-Pattern

Solution

20. SQL Injection

Purpose