Last active
September 17, 2022 10:23
-
-
Save sivy/4471054 to your computer and use it in GitHub Desktop.
Split a large array (inlist) into sublists (shards) of length (shard_size). Good for batch-jobbing large lists of data.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def _shard_array(inlist, shard_size): | |
# inlist = 150-element list | |
# shard_size = 40 | |
num_shards = len(inlist) / shard_size | |
# num_shards == 3 | |
shards = [] | |
for i in range(num_shards): | |
# i == 0 | |
start = shard_size * i # start == 0, then 40, then 80... | |
end = shard_size * (i + 1) - 1 # end == 39, then 79, then 119... | |
shards.append(inlist[start:end]) | |
return shards |
Wait... app.net is Python? I thought Dalton Caldwell was a Ruby guy.
Anyway, yeah, yield
is definitely the thing to use when dealing with huge lists of stuff. And ideally, generators all the way down so that's not all being stored in RAM. If that becomes a constraint.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Sure, that'll work. Here's what we use in the App.net codebase (it's more concise, but that's not necessarily a good thing):