Last active
December 12, 2015 12:48
-
-
Save jordanlewis/4774107 to your computer and use it in GitHub Desktop.
Cassandra data model: 1 or 2 column families?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Entity model: | |
- There exist pastures and cows | |
- Pastures have many cows | |
- Every cow has exactly one pasture | |
- Pastures have 2 unique ids a piece - one that rancher A identifies them by, one that rancher B identifies them by | |
Queries required: | |
- given a cow id, look up its type-A pasture id. | |
- given a type-A pasture id, look up its type-B pasture id. | |
Schema possibility 1: 2 column families. | |
1. row key = cow id, one static column "a", column value = type-A pasture id | |
2. row key = type-A pasture id, one static column "b", column value = type-B pasture id | |
Schema possibility 2: 1 column family with a secondary index. | |
row key = cow id, two static columns "a" and "b", with values that are the pasture id for that type | |
make a secondary index on the "a" column, so that you can ask for rows whose type-A pasture id is "x", and get the type-B pasture id directly from the row. | |
Which is better? Possibility 2 is quite denormalized, since every new cow row for a given pasture will have both types of pasture ids for the pasture. However, the secondary index is easier to maintain than another column family which must be kept in sync. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment