Skip to content

Instantly share code, notes, and snippets.

@sagarjhaa
Created November 11, 2019 16:53
Show Gist options
  • Save sagarjhaa/b01162a9c371d910882efbe298c31896 to your computer and use it in GitHub Desktop.
Save sagarjhaa/b01162a9c371d910882efbe298c31896 to your computer and use it in GitHub Desktop.
How to split large json file into multiple files.
import json
import datetime
import multiprocessing
SPLIT_SIZE = 100000
OUTPUT_FILE_NAME = 'output_file_'
print('started reading file....'+str(datetime.datetime.now()))
file = open('event_export_customer.json','r')
data = json.load(file)
print('finish reading file....' + str(datetime.datetime.now()))
def worker(range,filenumber):
filename = OUTPUT_FILE_NAME+str(filenumber)+'.json'
newFile = open(filename,'w')
newFile.write(json.dumps(data[range:range+SPLIT_SIZE]))
newFile.close()
print(str(datetime.datetime.now())+' Written '+str(range)+ ' to ' + str(range+SPLIT_SIZE) + ' in file '+ filename)
jobs = []
for i in range(len(data)/SPLIT_SIZE):
p = multiprocessing.Process(target=worker, args=(i*SPLIT_SIZE,i))
jobs.append(p)
p.start()
print(datetime.datetime.now())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment