Skip to content

Instantly share code, notes, and snippets.

@Randgalt
Created May 1, 2015 18:11
Show Gist options
  • Save Randgalt/99a49bc63260e8579a13 to your computer and use it in GitHub Desktop.
Save Randgalt/99a49bc63260e8579a13 to your computer and use it in GitHub Desktop.
randgalt1
Greetings C* people
I have a table that I need a secondary index for. I need to be able to search via a hash. So, high cardinality. I understand that that's not considered best practice for a C* secondary index. What are the pros/cons? Performance isn't really a concern with this.
Alternatively I could manage my own lookup table but that would be a lot more work
1:06
hanine [[email protected]] entered the room.
1:06
rcoli
in general secondary indexes sorta suck unless you really need the atomic update property
1:06
thobbs
well, if performance isn't a concern, you can use them
1:06
rcoli
but if you're motivated to save the other work...
1:07
thobbs
but it's going to be ~100x more expensive than querying your own lookup table
1:07
rcoli
^^^
1:07
randgalt1
100x - really? Wow. That is a consideration
1:07
stantonk [[email protected]] entered the room.
1:07
thobbs
yup, it's a scatter-gather operation, basically
1:07
randgalt1
Hmm - let me rethink
1:08
jeffj
randgalt1: each node indexes content on that node, so if you dont have a partition key in the WHERE clause, it probably asks every node
1:09
gonzaloserrano left the room (quit: Quit: Leaving.).
1:09
rcoli
(so you pay more in an absolute sense when you have more nodes...)
1:09
jeffj
is it really 100x though? it's all in parallel, but it is asking all N nodes instead of (probably 2-3)
1:09
rcoli
(which is not usually how horizontal scalability works)
1:09
randgalt1
Thanks folks - you've convinced me
1:09
jeffj
if you only have 10 nodes, it's probably not a big hit.
1:09
rcoli
jeffj: it's asking all N
oh, sorry, mis-parsed
1:09
clickcs [4cbab2fe@gateway/web/freenode/ip.76.186.178.254] entered the room.
1:09
jeffj
rcoli: <3
1:09
rcoli
it is vs is it
1:09
thobbs
jeffj: yeah, not necessarily 100x latency due to parallel requests
but the overhead in terms of processing, etc is probably 100x
1:10
jeffj
also if N is only 3-4x RF, then it's only 3-4x
1:10
jeffj
randgalt1: how many nodes in your cluster?
1:10
randgalt1
We don't know yet. But, I'd prefer to not have to worry about that. It's not a big deal to write my own lookup table
1:10
clickcs left the room (quit: Client Quit).
1:11
jeffj
1:11
ah, ok. do that then :D that's what most of us do
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment