Randgalt · May 1, 2015 18:11
diff --git a/gistfile1.txt b/gistfile1.txt
 randgalt1
 Greetings C* people
 I have a table that I need a secondary index for. I need to be able to search via a hash. So, high cardinality. I understand that that's not considered best practice for a C* secondary index. What are the pros/cons? Performance isn't really a concern with this.
 Alternatively I could manage my own lookup table but that would be a lot more work
 1:06
 hanine [[email protected]] entered the room.

 1:06
 rcoli
 in general secondary indexes sorta suck unless you really need the atomic update property

 1:06
 thobbs
 well, if performance isn't a concern, you can use them

 1:06
 rcoli
 but if you're motivated to save the other work...

 1:07
 thobbs
 but it's going to be ~100x more expensive than querying your own lookup table

 1:07
 rcoli
 ^^^

 1:07
 randgalt1
 100x - really? Wow. That is a consideration
 1:07
 stantonk [[email protected]] entered the room.

 1:07
 thobbs
 yup, it's a scatter-gather operation, basically

 1:07
 randgalt1
 Hmm - let me rethink
 1:08
 jeffj
 randgalt1: each node indexes content on that node, so if you dont have a partition key in the WHERE clause, it probably asks every node
 1:09
 gonzaloserrano left the room (quit: Quit: Leaving.).

 1:09
 rcoli
 (so you pay more in an absolute sense when you have more nodes...)

 1:09
 jeffj
 is it really 100x though? it's all in parallel, but it is asking all N nodes instead of (probably 2-3)

 1:09
 rcoli
 (which is not usually how horizontal scalability works)

 1:09
 randgalt1
 Thanks folks - you've convinced me

 1:09
 jeffj
 if you only have 10 nodes, it's probably not a big hit.

 1:09
 rcoli
 jeffj: it's asking all N
 oh, sorry, mis-parsed
 1:09
 clickcs [4cbab2fe@gateway/web/freenode/ip.76.186.178.254] entered the room.

 1:09
 jeffj
 rcoli: <3

 1:09
 rcoli
 it is vs is it

 1:09
 thobbs
 jeffj: yeah, not necessarily 100x latency due to parallel requests
 but the overhead in terms of processing, etc is probably 100x

 1:10
 jeffj
 also if N is only 3-4x RF, then it's only 3-4x
 1:10
 jeffj
 randgalt1: how many nodes in your cluster?

 1:10
 randgalt1
 We don't know yet. But, I'd prefer to not have to worry about that. It's not a big deal to write my own lookup table
 1:10
 clickcs left the room (quit: Client Quit).

 1:11
 jeffj
 1:11
 ah, ok. do that then :D that's what most of us do
	randgalt1
	Greetings C* people
	I have a table that I need a secondary index for. I need to be able to search via a hash. So, high cardinality. I understand that that's not considered best practice for a C* secondary index. What are the pros/cons? Performance isn't really a concern with this.
	Alternatively I could manage my own lookup table but that would be a lot more work
	1:06
	hanine [[email protected]] entered the room.

	1:06
	rcoli
	in general secondary indexes sorta suck unless you really need the atomic update property

	1:06
	thobbs
	well, if performance isn't a concern, you can use them

	1:06
	rcoli
	but if you're motivated to save the other work...

	1:07
	thobbs
	but it's going to be ~100x more expensive than querying your own lookup table

	1:07
	rcoli
	^^^

	1:07
	randgalt1
	100x - really? Wow. That is a consideration
	1:07
	stantonk [[email protected]] entered the room.

	1:07
	thobbs
	yup, it's a scatter-gather operation, basically

	1:07
	randgalt1
	Hmm - let me rethink
	1:08
	jeffj
	randgalt1: each node indexes content on that node, so if you dont have a partition key in the WHERE clause, it probably asks every node
	1:09
	gonzaloserrano left the room (quit: Quit: Leaving.).

	1:09
	rcoli
	(so you pay more in an absolute sense when you have more nodes...)

	1:09
	jeffj
	is it really 100x though? it's all in parallel, but it is asking all N nodes instead of (probably 2-3)

	1:09
	rcoli
	(which is not usually how horizontal scalability works)

	1:09
	randgalt1
	Thanks folks - you've convinced me

	1:09
	jeffj
	if you only have 10 nodes, it's probably not a big hit.

	1:09
	rcoli
	jeffj: it's asking all N
	oh, sorry, mis-parsed
	1:09
	clickcs [4cbab2fe@gateway/web/freenode/ip.76.186.178.254] entered the room.

	1:09
	jeffj
	rcoli: <3

	1:09
	rcoli
	it is vs is it

	1:09
	thobbs
	jeffj: yeah, not necessarily 100x latency due to parallel requests
	but the overhead in terms of processing, etc is probably 100x

	1:10
	jeffj
	also if N is only 3-4x RF, then it's only 3-4x
	1:10
	jeffj
	randgalt1: how many nodes in your cluster?

	1:10
	randgalt1
	We don't know yet. But, I'd prefer to not have to worry about that. It's not a big deal to write my own lookup table
	1:10
	clickcs left the room (quit: Client Quit).

	1:11
	jeffj
	1:11
	ah, ok. do that then :D that's what most of us do