Skip to content

Instantly share code, notes, and snippets.

@jordanlewis
Last active December 12, 2015 12:48
Show Gist options
  • Save jordanlewis/4774107 to your computer and use it in GitHub Desktop.
Save jordanlewis/4774107 to your computer and use it in GitHub Desktop.
Cassandra data model: 1 or 2 column families?
Entity model:
- There exist pastures and cows
- Pastures have many cows
- Every cow has exactly one pasture
- Pastures have 2 unique ids a piece - one that rancher A identifies them by, one that rancher B identifies them by
Queries required:
- given a cow id, look up its type-A pasture id.
- given a type-A pasture id, look up its type-B pasture id.
Schema possibility 1: 2 column families.
1. row key = cow id, one static column "a", column value = type-A pasture id
2. row key = type-A pasture id, one static column "b", column value = type-B pasture id
Schema possibility 2: 1 column family with a secondary index.
row key = cow id, two static columns "a" and "b", with values that are the pasture id for that type
make a secondary index on the "a" column, so that you can ask for rows whose type-A pasture id is "x", and get the type-B pasture id directly from the row.
Which is better? Possibility 2 is quite denormalized, since every new cow row for a given pasture will have both types of pasture ids for the pasture. However, the secondary index is easier to maintain than another column family which must be kept in sync.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment