Created
May 1, 2015 18:11
-
-
Save Randgalt/99a49bc63260e8579a13 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
randgalt1 | |
Greetings C* people | |
I have a table that I need a secondary index for. I need to be able to search via a hash. So, high cardinality. I understand that that's not considered best practice for a C* secondary index. What are the pros/cons? Performance isn't really a concern with this. | |
Alternatively I could manage my own lookup table but that would be a lot more work | |
1:06 | |
hanine [[email protected]] entered the room. | |
1:06 | |
rcoli | |
in general secondary indexes sorta suck unless you really need the atomic update property | |
1:06 | |
thobbs | |
well, if performance isn't a concern, you can use them | |
1:06 | |
rcoli | |
but if you're motivated to save the other work... | |
1:07 | |
thobbs | |
but it's going to be ~100x more expensive than querying your own lookup table | |
1:07 | |
rcoli | |
^^^ | |
1:07 | |
randgalt1 | |
100x - really? Wow. That is a consideration | |
1:07 | |
stantonk [[email protected]] entered the room. | |
1:07 | |
thobbs | |
yup, it's a scatter-gather operation, basically | |
1:07 | |
randgalt1 | |
Hmm - let me rethink | |
1:08 | |
jeffj | |
randgalt1: each node indexes content on that node, so if you dont have a partition key in the WHERE clause, it probably asks every node | |
1:09 | |
gonzaloserrano left the room (quit: Quit: Leaving.). | |
1:09 | |
rcoli | |
(so you pay more in an absolute sense when you have more nodes...) | |
1:09 | |
jeffj | |
is it really 100x though? it's all in parallel, but it is asking all N nodes instead of (probably 2-3) | |
1:09 | |
rcoli | |
(which is not usually how horizontal scalability works) | |
1:09 | |
randgalt1 | |
Thanks folks - you've convinced me | |
1:09 | |
jeffj | |
if you only have 10 nodes, it's probably not a big hit. | |
1:09 | |
rcoli | |
jeffj: it's asking all N | |
oh, sorry, mis-parsed | |
1:09 | |
clickcs [4cbab2fe@gateway/web/freenode/ip.76.186.178.254] entered the room. | |
1:09 | |
jeffj | |
rcoli: <3 | |
1:09 | |
rcoli | |
it is vs is it | |
1:09 | |
thobbs | |
jeffj: yeah, not necessarily 100x latency due to parallel requests | |
but the overhead in terms of processing, etc is probably 100x | |
1:10 | |
jeffj | |
also if N is only 3-4x RF, then it's only 3-4x | |
1:10 | |
jeffj | |
randgalt1: how many nodes in your cluster? | |
1:10 | |
randgalt1 | |
We don't know yet. But, I'd prefer to not have to worry about that. It's not a big deal to write my own lookup table | |
1:10 | |
clickcs left the room (quit: Client Quit). | |
1:11 | |
jeffj | |
1:11 | |
ah, ok. do that then :D that's what most of us do |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment