Skip to content

Instantly share code, notes, and snippets.

@mattbillenstein
Last active December 6, 2018 18:53
Show Gist options
  • Save mattbillenstein/34cf2907390102ffbabd982a3662b204 to your computer and use it in GitHub Desktop.
Save mattbillenstein/34cf2907390102ffbabd982a3662b204 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
import json
import time
start = time.time()
L = []
i = 0
with open('in.json') as f:
for line in f:
L.append(json.loads(line))
i += 1
if i % 100000 == 0:
print(i)
print('read', time.time() - start)
L.sort(key=lambda x: x['id'])
print('sort', time.time() - start)
i = 0
with open('out.json', 'w') as f:
for d in L:
f.write(json.dumps(d, sort_keys=True) + '\n')
i += 1
if i % 100000 == 0:
print(i)
print('write', time.time() - start)
@mattbillenstein
Copy link
Author

It's part of a db table dump - just part of the largest line-delimited json I had lying around -- I was curious what a python script could do re https://genius.engineering/faster-and-simpler-with-the-command-line-deep-comparing-two-5gb-json-files-3x-faster-by-ditching-the-code/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment