Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save learner-long-life/909824e7866c158f053d54b16b1750de to your computer and use it in GitHub Desktop.
Save learner-long-life/909824e7866c158f053d54b16b1750de to your computer and use it in GitHub Desktop.
Partitions vs. Topics for Sharding
Changing topics:
I’m considering using 64 different topics, listings0 through listings63, instead of 64 partitions. This would allow us to parallelize multiple ML processors, as consumer groups for the same shard.
We can have the KS apps actually running on the Solr machines themselves, along with the reducer for a particular shard.
hyungoo [5:50 PM]
@cryptogoth would we need more than 64-way parallelism?
[5:50]
also, if we want to parallelize, we can keep it a single topic and have 64*n partitions
cryptogoth
[6:23 PM]
@hyungoo we would benefit from more than 64-way parallelism. For example, 4-way parallelism within each of the 64 shards (256-way)
[6:23]
if we are running a backfill of ML processor jobs, that could be beneficial
hyungoo [6:23 PM]
i see
[6:23]
i’m fine with having 256 partitions
cryptogoth
[6:23 PM]
The design in the roadmap and design document uses 64 partitions, but there are some difficulties (edited)
hyungoo [6:23 PM]
but i don’t see why we’d need 64 topics, though
[6:24]
i’d have 1 topic with 256 partition instead
cryptogoth
[6:24 PM]
* We cannot auto-balance within a partition. Consumers, outside of a group, would have to hardcode / subscribe to that partition, and kafka could not help distribute load between them
hyungoo [6:25 PM]
oh
[6:25]
yeah
[6:25]
having 1 topic with 256 partition would not have any issue with that?
cryptogoth
[6:25 PM]
So if we have 4 ML processor jobs for shard 1, they couldn’t help consume partition 1 in an auto-balanced way
hyungoo [6:26 PM]
why can’t we have 4 partitions for each shard/column?
[6:26]
hm actually
[6:26]
back to a more important question
[6:27]
why do we think 64 partition is not parallel enough?
[6:27]
your points are true, but i’d like to avoid that unless we actually need to
cryptogoth
[6:27 PM]
64-partition would probably be parallel enough.
hyungoo [6:27 PM]
is it the indexing part? reducing KS part? or ML inference part?
cryptogoth
[6:28 PM]
the ML inference part is where I think it would be useful to have a separate topic for each shard
hyungoo [6:29 PM]
hm
[6:29]
that’d actually be one of the biggest bottleneck
[6:30]
but we could tackle it in different ways
[6:30]
we could have just one consumer for ML inference part and scale up/down cicero servers
[6:30]
or we could let one consumer talk to a single cicero server (possibly a local one)
cryptogoth
[6:30 PM]
true. interesting
hyungoo [6:30 PM]
and scale the number of consumer - in this case, we may need 64+ partitions
[6:31]
I have a feeling 64 cicero server would be good for our first implementation, though
[6:32]
unless we move on to cloud or k8s soon
cryptogoth
[6:33 PM]
in my current KTables demo i’ll stick with partitions for now
[6:33]
right now we only have 4 prod cicero servers (blackbird05-blackbird08) that i’m aware of
hyungoo [6:34 PM]
sounds good to me
[6:36]
keeping things simple would help us get our first n-to-n version working quickly
cryptogoth
[7:43 PM]
:thumbsup:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment