Skip to content

Instantly share code, notes, and snippets.

@VarunVats9
Last active February 24, 2020 08:31
Show Gist options
  • Save VarunVats9/281482b2cd67413c3108d840687fa941 to your computer and use it in GitHub Desktop.
Save VarunVats9/281482b2cd67413c3108d840687fa941 to your computer and use it in GitHub Desktop.
Cassandra
Reference:
https://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure-sets-lists-and-maps/
// Column-Family
-----------------------------------------------------------------------------------------------
ID Last First Bonus
1 Doe John 8000
2 Smith Jane 4000
3 Beck Sam 1000
Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.
Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter.
Cassandra will automatically repartition as machines are added and removed from the cluster.
Row store means that like relational databases, Cassandra organizes data by rows and columns.
In a row-oriented database management system, the data would be stored like this:
1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;
In a column-oriented database management system, the data would be stored like this:
1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;
Cassandra is basically a column-family store
Cassandra would store the above data as,
"Bounses" : {
row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000},
row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000}
...
}
// [ BASIC PRIMARY KEY ] RowKey: PARTITION_KEY_VALUE column = FIELD_NAME value = FIELD_VALUE
-----------------------------------------------------------------------------------------------
RowKey: 1
=> (column=, value=, timestamp=1374546754299000)
=> (column=field2, value=00000002, timestamp=1374546754299000)
=> (column=field3, value=00000003, timestamp=1374546754299000)
// [ COMPOSITE PRIMARY KEY ] RowKey: PARTITION_KEY_VALUE
// column = CLUSTERING_KEY_VALUE:FIELD_NAME value = FIELD_VALUE
-----------------------------------------------------------------------------------------------
RowKey: softwaredoug
=> (column=2013-07-13 08:21:54-0400:, value=, timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:lat, value=4218a5e3, timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:long, value=c29d1917, timestamp=1374673155373000)
=> (column=2013-07-13 08:21:54-0400:tweet, value=486176696e67206368657374207061696e2e, timestamp=1374673155373000)
=> (column=2013-07-21 12:15:27-0400:, value=, timestamp=1374673155407000)
=> (column=2013-07-21 12:15:27-0400:lat, value=42185f3b, timestamp=1374673155407000)
=> (column=2013-07-21 12:15:27-0400:long, value=c29d2560, timestamp=1374673155407000)
=> (column=2013-07-21 12:15:27-0400:tweet, value=53706565646f2073656c662073686f742e, timestamp=1374673155407000)
// [ MORE COMPOSITE PRIMARY KEY ] RowKey: PARTITION_KEY[1]_VALUE:PARTITION_KEY[2]_VALUE
// column = CLUSTERING_KEY[1]_VALUE:CLUSTERING_KEY[2]_VALUE:FIELD_NAME value = FIELD_VALUE
-----------------------------------------------------------------------------------------------
RowKey: partitionVal1:partitionVal2
=> (column=clusterVal1:clusterVal2:, value=, timestamp=1374630892473000)
=> (column=clusterVal1:clusterVal2:normalfield1, value=6e6f726d616c56616c31, timestamp=1374630892473000)
=> (column=clusterVal1:clusterVal2:normalfield2, value=6e6f726d616c56616c32, timestamp=1374630892473000)
// [ MAP ] column = MAP_FIELD_NAME:KEY value = VALUE_OF_KEY
-----------------------------------------------------------------------------------------------
RowKey: scott
=> (column=, value=, timestamp=1374684062860000)
=> (column=phonenumbers:bill, value='555-7382', timestamp=1374684062860000)
=> (column=phonenumbers:jane, value='555-8743', timestamp=1374684062860000)
=> (column=phonenumbers:patricia, value='555-4326', timestamp=1374684062860000)
// [ LIST ] column = LIST_FIELD_NAME:UUID value = FIELD_VALUE
// UUIDs are maintained to keep the entries in order, inserts are fast, deletes are slow(scan).
-----------------------------------------------------------------------------------------------
RowKey: john
=> (column=, value=, timestamp=1374687324950000)
=> (column=friends:26017c10f48711e2801fdf9895e5d0f8, value='doug', timestamp=1374687206993000)
=> (column=friends:26017c11f48711e2801fdf9895e5d0f8, value='patricia', timestamp=1374687206993000)
=> (column=friends:26017c12f48711e2801fdf9895e5d0f8, value='scott', timestamp=1374687206993000)
=> (column=friends:6c504b60f48711e2801fdf9895e5d0f8, value='matt', timestamp=1374687324950000)
=> (column=friends:6c504b61f48711e2801fdf9895e5d0f8, value='eric', timestamp=1374687324950000)
// [ SET ] column = SET_FIELD_NAME:VALUE value = EMPTY
-----------------------------------------------------------------------------------------------
RowKey: john
=> (column=, value=, timestamp=1374688135443000)
=> (column=friends:'doug', value=, timestamp=1374688108307000)
=> (column=friends:'eric', value=, timestamp=1374688135443000)
=> (column=friends:'matt', value=, timestamp=1374688135443000)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment