Skip to content

Instantly share code, notes, and snippets.

@kubosuke
Last active January 6, 2024 13:12
Show Gist options
  • Save kubosuke/d295001af338ca41fd67f43adacf776f to your computer and use it in GitHub Desktop.
Save kubosuke/d295001af338ca41fd67f43adacf776f to your computer and use it in GitHub Desktop.
SQL Antipattern

SQL Antipattern

1. Jaywalk

Purpose

store attribute that has multiple values

CREATE TABLE Products (
-- ...
account_id VARCHAR(100),
-- ...
)

Anti-Pattern

  • We should not store account_id separated by comma
    • hard to query, update, maintain

Solution

  • Use intersection table to inverse the dependencies

image

2. Naive Tree

Purpose

  • Store tree structure

Anti-Pattern

  • Depends on Parent
    • child holds parent id
    • ex.) get all childs for the parent: need to issue JOIN for every traverse
    • hard to maintain (specifically deletion the node)

Solution

  • Path Enumeration
    • store path (ex. 1/2/3/ 1/4/6/7/) information
  • Nested Set
  • Closure Table
    • Create additional table to store tree structure
      CREATE TABLE TreePaths (
          ancestor BIGINT,
          descendant BIGINT,
          PRIMARY KEY (ancestor, descendant),
          FOREIGN KEY (ancestor) REFERENCES Comments(comment_id),
          FOREIGN KEY (descendant) REFERENCES Comments(comment_id),
      );

Use GraphQL might be much more faster?

3. ID required

Purpose

  • How to define primary key

Anti-Pattern

  • Add id column to all tables
    • it causes creation of redundunt key
    • it might allows undesired duplicate rows
    • hard to understand the meaning of key

Solution

  • Use the column name that's easy to understand
    • id => comment_id, user_id, product_id
  • compound key if needed

4. Keyless entry

Purpose

  • Simplify database architecture

Anti-Pattern

  • Not using foreign key
    • to Update/Delete the row that refers other table column, need to delete the data in the dependent table beforehand

Solution

  • Use foreign key
    • Able to do Cascade Update
      CREATE TABLE Bugs (
        reported_by BIGINT,
        status VARCHAR(20) NOT NULL,
        FOREIGN KEY(reported_by) REFERENCES Accounts(account_id) 
            ON UPDATE CASCADE
            ON DELETE RESTRICT,
        FOREIGN KEY (status) REFERENCES BugStatus(status) 
            ON UPDATE CASCADE 
            ON DELETE SET DEFAULT
      );

5. EAV

Purpose

Support mutable attribute

Anti-Pattern

  • Create flexible attribute tabled

image

  • what's wrong with above architecture:
    • unable to set required attribute
    • all the attribute data type become same
      • for instance, reported_date, version should be defined as date, float etc, but unable to define like that cause all attributes should have the same datatype (most likely VARCHAR etc)
      • we need to handle the datatype is valid on application side

Solution

  • If the mutable data structure is needed, let's use NoSQL etc.
  • Single Table inheritance
    • create Table that has all the attributes as a column
      • cons: some rows might have NULL values for some attributes.
  • Concrete Table inheritance
    • follow STi, but create table for each subclasses
      • ex.) BugsAttributes, FeatureRequestAttributes etc

6. Polymorphic relationship

Purpose

  • Refer multiple parent tables

image

Anti-Pattern

  • don't use foreign key
CREATE TABLE Comments (
    comment_id SERIAL PRIMARY KEY,
    issue_type VARCHAR(20),
    -- holds Bugs or FeatureRequests
    issue_id BIGINT
);
  • if we define the Comment table like above, we cannot use foreign key cause it refers two tables (Bugs or FeatureRequests).
  • and also, if we query the comments from issue_id, need to JOIN two parent tables
  • zombie comments (comments that refers non-exist rows) might exist cause it cannot use foreign key.

Solution

  • Define intersection table

image

  • Alternatively, we can use Common Parent table like this:

image

7. Multi Column Attributes

Purpose

  • define attributes that has multiple values
    • ex.) issue labels
      • it may has release version, severity, assignee, ETA etc

Anti-Pattern

  • define multiple columns
CREATE TABLE Bugs (
    bug_id SERIAL PRIMARY KEY,
    details VARCHAR(200),
    -- holds Bugs or FeatureRequests
    labels1 VARCHAR(20),
    labels2 VARCHAR(20),
    labels3 VARCHAR(20),
);
  • in above case, we can holds up to 3 labels
    • if the issue has only 1 or 2 labels, means some values become NULL
  • what's the problem?
    • to search the row that has "performance" label
      • need to query like WHERE label1 = 'performance' OR ...
    • same as above, update and delete will be bothersome too
    • some labels could have the same value
    • it'd be bothersome too when we increase the number of available label (ex. label4, label5)

Solution

  • Create Dependent Table that has label values
CREATE TABLE Labels (
    label_id SERIAL PRIMARY KEY,
    label VARCHAR(20),
    -- holds Bugs or FeatureRequests
    PRIMARY KEY (label_id, label)
    FOREIGN KEY (label_id) REFERENCES Labels(label_id)
);

8. Metadata Tribble

Purpose

  • Increase Scalability in case of the Table becomes huge

Anti-Pattern

  • Split Table
    • Tables like: Bugs_2008, Bugs_2009 etc
  • what's the problems?
    • need to have some tricky logics for the boundary values
      • ex.) register row to Bugs_2010 cause the date of data is Jan 3.
      • later realized the date of the data was 2009 Dec
      • in that case, we need to delete dat in Bugs_2010 and copy to Bugs_2009
    • hard to query among multiple tables

Solution

  • Partitioning table
    • Horizonal Partitioning:
      CREATE TABLE Bugs (bug_id SERIAL PRIMARY KEY, -- ...
      ) PARTITION BY HASH (YEAR (date_reported)) PARTITIONS 4;
    • Virtical Partitioning:
      • split table by column
      • it's beneficial when we use huge datatype like BLOB, TEXT etc
  • Introduce Dependent Table

9. Rounding Error

Purpose

Use float values

Anti-Pattern

  • Using FLOAT data type
  • what's wrong?
    • float is not an actual value, is a value approximated by the power of 2

Solution

  • Using NUMERIC or DECIMAL

10. 31 flavours

Purpose

  • limit column values to specific values
    • ex.) let status column can holds only OPEN, RESOLVED, CLOSED etc.

Anti-Pattern

  • Define limited value by column definition
CREATE TABLE Bugs (
    bug_id SERIAL PRIMARY KEY,
    status VARCHAR(20) CHECK (status IN ('NEW', 'IN PROGRESS', "RESOLVED"))
);
  • what's the problems?
    • Unable to know the available values on the application side
    • to delete the allowed value, need to do:
      • remove constraint
      • update target value
      • apply new constraint

Solution

  • Creata Mater table that holds the alloed value

11. Phantom file

Purpose

  • Store media files (ex. pics, videos)

Anti-Pattern

Solution

We can use two ways based on the application situations:

  • Using BLOB

  • store filepath of the actual files

    • hard to maintain transaction, rollback etc.
  • alternatively, some database supports the logic that can handle the external files transparently

    • Oracle: BFILE
    • SQL SERVER 2008: FILESTREAM

12. INDEX Shotgun

Purpose

  • Increase the performance

Anti-Pattern

  • Applying the index without analysis
    • if we apply the index too much, DB needs to maintain the index on each insertion
    • if we apply a few index, query costs would be huge

Solution

  • Analysis
    • check slow query by analysis tools (ex. pt-query-digest)
    • check the index is used or not by EXPLAIN the actual queries

13. Fear of UNKNOWN

Purpose

  • Allow the row that has lacked columns
    • ex.) for storing the information of employee, some has middle name but others not

Anti-Pattern

  • Use NULL as a general value
  • what's the problem
    • WHERE NOT cannot fetch NULL values
    • Unable to use prepared statement for NULL value
  • Avoid Using NULL and use alternative values that means NULL
    • -1, "" etc
    • if we use alternative values, means the application need to have above domain knowledges

Solution

  • not use NULL value as much as possible
  • if impossible, use NULL in a scalar formulas
    • TRUE and NULL = NULL
    • FALSE and NULL = FALSE
    • TRUE or NULL = TRUE
    • FALSE or NULL = NULL
  • Use COALESCE to return non-NULL if the value is NULL

14. ambiguous group

Purpose

  • get the row that has the maximum value in the group

Anti-Pattern

  • refer non-grouped value
SELECT product_id, MAX(date_reported) AS latest, bug_id
FROM Bugs GROUP BY product_id;
  • what's the problem:
    • product_id , date_reported are correct, for these columns, the value would be the unique
    • buf for bug_id, it's out of scope of GROUP, means it may return ambiguous values
      • the behaviour differs for each database, for MySQL and SQLite, it'll returns the ambiguous values, others return error in general.
    • when using GROUP BY, we need to follow Single Value Rule.

Solution

  • Use subquery
  • Use derived table

15. Random selection

Purpose

  • fetch random row

Anti-Pattern

  • ORDER BY RAND() LIMIT n
    • performance down, cause DB cannot utilize index fo this randomized sort

Solution

  • Specify random value between the smallest and largest
    • CEIL(RAND() * (SELECT MAX(id) From Bugs)) AS rand_id
  • Get all ids firstly, then specify the random id from the result

16. Poorman's searchengine

Purpose

  • Full text search

Anti-Pattern

  • Using pattern match statement
    • LIKE %crash%
  • what's the problem
    • cannot use index for that search

Solution

  • Using thirdparty searchengine like Sphinx, Lucene, Elasticsearch
  • use MyISAM FULLTEXT INDEX
    • available for CHAR, VARCHAR, TEXT only
  • SQL Server also supports full text index
  • PostgreSQL has TSVECTOR type that supports full text search
  • SQLite supports FTS

17. Spagetti query

Purpose

  • Decrease the number of query

Anti-Pattern

  • Forcibly resolve the problem with one query
    • SELECT, JOIN, JOIN, JOIN...
    • it may leads to undesired Cartesian Products

Solution

  • use UNION statement to combine the each qeury result
SELECT p.product_id, 'FIXED' as status, COUNT(p.id) AS bug_count FROM Bugs
UNION ALL
SELECT p.product_id, 'OPEN' as status,COUNT(p.id) AS bug_count FROM Bugs
  • If possible, better to divide one huge query to multiple queries and do divide and conquer for more maintainability

18. Implicit column

Purpose

  • Reduce the size of query

Anti-Pattern

  • Using asterisk
  • what's wrong
    • if you use JOIN and both tables has the same name column, it may leads to undesired error
      • JOIN Book table and Author table, get $row["title"] => if both Book and Author table has "title" column, you'll get untrusted results
    • to reduce the consumption of NW bandwidth, better not to fetch the non-necessary data

Solution

  • specify the required column name explicitly, instead of using asterisk

19. Readable password

Purpose

  • Storing password into the database

Anti-Pattern

  • storing password as a plain text
    • HUGE SECURITY RISK

Solution

  • Storing Hash with Solt

20. SQL Injection

Purpose

Anti-Pattern

Solution

  • Prepared statement
    • passing the variables when preparing the query
  • filtering the query on the application side

21. Pseudokey neat freak

  • don't fill the lacked value,
    • ex.) the lacked row of auto-increment column
    • reusing the pseudo key may leads to undesired error
    • we can use UUID alternatively

22. See no evil

// omit

23. Diplomatic immunity

  • let's create documentation and required information to make DB much more durable, and for preparing unexpected situations
    • ER diagrams
    • DDL
    • infrastructure (conf, logging)
    • NW
    • seed data
    • PIT backup
    • unittest for DB(table, column exist or not, check constraint, check ORM, check seed data)

24. Magic Beans

  • don't make DAO =~ Model in MVC context

image

  • let models hold domain knowledge and decouping with DAO

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment