Skip to content

Instantly share code, notes, and snippets.

@paulghaddad
Last active August 29, 2015 14:15
Show Gist options
  • Save paulghaddad/fc074fb656c2da3a824f to your computer and use it in GitHub Desktop.
Save paulghaddad/fc074fb656c2da3a824f to your computer and use it in GitHub Desktop.
Level Up 7: Knows how indices work and the tradeoffs they offer
1. Explain the tradeoff that indices allow us to make.
There is a fundamental tradeoff when using indices. On the positive side, indices allow us to locate and read rows really quickly. On the negative side, indices impose a cost. While indices are a benefit when reading data, they are a detriment when updating or writing data to a table, because they must be maintained. Every single write to a table with an index requires another write to the index table. Moreover, every index will require additional write operations. If there are five indices on a table, an update or insert will require six writes: one to the table itself, and five to update the index tables. This is very inefficient. Thus, the fundamental tradeoff with indices are faster read operations and slower write operations. Indices are a classic example of space-time tradeoffs. We store more data in exchange for fast processing.
2. Show how indices are stored on the filesystem, and explain how that affects the database's ability to use multi-column indices for a given query.
Indices in PostgreSQL are stored in a separate data structure from the table itself. The index contains a copy of each item in the table in a particular order. The index items in the data structure can be searched with an algorithm very quickly. PostgreSQL uses several different data structures to stores indexes: B-Tree; Hash; Generalized Inverted Index; and Generalized Search Tree.
For multi-column indices, only the B-tree, GiST and GIN index types are supported. When using multi-column indices, it is imperative to place constraints on the left-most columns of the index. For example, if there is a multi-column index on (a, b, c), a WHERE constraint should, at the very least, be placed on "a". Otherwise, the entire index would need to be scanned, making a full-table scan faster than using the index, in most cases.
In addition, multi-column indices can only optimize the queries that reference the columns in the index in the same order. If this isn't the case, then multiple single column indicess will be a better option because they will benefit a larger number of queries. Thus, multi-column indices are only worth considering if the order of its columns matches those of your queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment