Created
November 18, 2014 11:41
-
-
Save 97-109-107/bf9211c4a160deb4ee15 to your computer and use it in GitHub Desktop.
A tiny python thing to split big json files into smaller junks.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# based on http://stackoverflow.com/questions/7052947/split-95mb-json-array-into-smaller-chunks | |
# usage: python json-split filename.json | |
# produces multiple filename_0.json of 1.49 MB size | |
import json | |
import sys | |
with open(sys.argv[1],'r') as infile: | |
o = json.load(infile) | |
chunkSize = 4550 | |
for i in xrange(0, len(o), chunkSize): | |
with open(sys.argv[1] + '_' + str(i//chunkSize) + '.json', 'w') as outfile: | |
json.dump(o[i:i+chunkSize], outfile) |
what if I don't know about chunkSize value?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi there thanks for the script! I was trying it on an 800mb json file - unfortunately didn't work. Do you have any idea why it only spit out two files, filename_0 with about 700mb and filename_1 with 200mb? Might it be something with the len(o) (since it is not specifified whether its counting bytes, lines etc.)?
And I don;t have to call for !/usr/bin/env python if i want to run it in the terminal right?
Also i believe in the terminal command there needs to be the ".py" after the json-split filename right?