Skip to content

Instantly share code, notes, and snippets.

@dela3499
Created February 11, 2016 02:36
Show Gist options
  • Save dela3499/ad8c4ed90e78fce59d70 to your computer and use it in GitHub Desktop.
Save dela3499/ad8c4ed90e78fce59d70 to your computer and use it in GitHub Desktop.
Apply function to file in chunks.
import fileinput
import numpy as np
import cPickle as pickle
# (String -> a) -> Filename -> Int -> SideEffect[FileSystem] # List a
def chunkapply(f, filename, chunksize, savefile):
""" For each chunk, apply f to each line, and save the list of results to
file starting with savefilename, and followed by chunk number. """
data = []
line_counter = 0
chunk_index = 0
for line in fileinput.input(filepath):
data.append(f(line))
if line_counter > chunk_size:
savefilename = '{}_{}.pkl'.format(savefile, str(chunk_index))
pickle.dump(np.array(data), open(savefilename, 'wb'))
line_counter = 0 # reset line counter
chunk_index += 1 # increment chunk index
del data[:] # free memory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment