jordanlewis · December 12, 2015 12:48
diff --git a/modeling-cows.txt b/modeling-cows.txt
 Entity model:
 - There exist pastures and cows
 - Pastures have many cows
 - Every cow has exactly one pasture
 - Pastures have 2 unique ids a piece - one that rancher A identifies them by, one that rancher B identifies them by

 Queries required:
 - given a cow id, look up its type-A pasture id.
 - given a type-A pasture id, look up its type-B pasture id.


 Schema possibility 1: 2 column families.
  1. row key = cow id, one static column "a", column value = type-A pasture id
  2. row key = type-A pasture id, one static column "b", column value = type-B pasture id

 Schema possibility 2: 1 column family with a secondary index.
  row key = cow id, two static columns "a" and "b", with values that are the pasture id for that type
  make a secondary index on the "a" column, so that you can ask for rows whose type-A pasture id is "x", and get the type-B pasture id directly from the row.

 Which is better? Possibility 2 is quite denormalized, since every new cow row for a given pasture will have both types of pasture ids for the pasture. However, the secondary index is easier to maintain than another column family which must be kept in sync.
	Entity model:
	- There exist pastures and cows
	- Pastures have many cows
	- Every cow has exactly one pasture
	- Pastures have 2 unique ids a piece - one that rancher A identifies them by, one that rancher B identifies them by

	Queries required:
	- given a cow id, look up its type-A pasture id.
	- given a type-A pasture id, look up its type-B pasture id.


	Schema possibility 1: 2 column families.
	1. row key = cow id, one static column "a", column value = type-A pasture id
	2. row key = type-A pasture id, one static column "b", column value = type-B pasture id

	Schema possibility 2: 1 column family with a secondary index.
	row key = cow id, two static columns "a" and "b", with values that are the pasture id for that type
	make a secondary index on the "a" column, so that you can ask for rows whose type-A pasture id is "x", and get the type-B pasture id directly from the row.

	Which is better? Possibility 2 is quite denormalized, since every new cow row for a given pasture will have both types of pasture ids for the pasture. However, the secondary index is easier to maintain than another column family which must be kept in sync.