-
-
Save hailiang-wang/567ebca0f59c612eb977065008aad867 to your computer and use it in GitHub Desktop.
#!/usr/local/bin/python3 | |
''' | |
Convert a pkl file into json file | |
''' | |
import sys | |
import os | |
import _pickle as pickle | |
import json | |
def convert_dict_to_json(file_path): | |
with open(file_path, 'rb') as fpkl, open('%s.json' % file_path, 'w') as fjson: | |
data = pickle.load(fpkl) | |
json.dump(data, fjson, ensure_ascii=False, sort_keys=True, indent=4) | |
def main(): | |
if sys.argv[1] and os.path.isfile(sys.argv[1]): | |
file_path = sys.argv[1] | |
print("Processing %s ..." % file_path) | |
convert_dict_to_json(file_path) | |
else: | |
print("Usage: %s abs_file_path" % (__file__)) | |
if __name__ == '__main__': | |
main() |
Just wanted to say thanks for this. Helped a lot with ensuring my pickled data was as intended.
This will not work if the dict has tuples.
@jcopps, if the dict
or set
types have the tuples
, it should add some customized code snippets about traversing every dict
to check the tuple
position.
Then convert them to list
type before using JSON dumps.
For example, I assume that the following record
is one of set
in pickle file:
record = {(1,2,3,3), (1,2,3,4)}
type(record) # set
Trying to use json.dumps
to convert them to JSON
, and it will throw following error:
TypeError: {(1, 2, 3, 3), (1, 2, 3, 4)} is not JSON serializable
io = StringIO()
json_string = json.dump(record, io, ensure_ascii=False, sort_keys=True, indent=0)
To fix that, it will do following code snippets firstly:
record = list(record) # [(1, 2, 3, 3), (1, 2, 3, 4)]
record_index=0
while record_index < len(record):
record[record_index] = list(record[record_index])
record_index += 1
print(record) # [[1, 2, 3, 3], [1, 2, 3, 4]]
Then using json.dumps
again:
io = StringIO()
json_string = json.dump(record, io, ensure_ascii=False, sort_keys=True, indent=0)
print(io.getvalue())
"""
[
[
1,
2,
3,
3
],
[
1,
2,
3,
4
]
]
"""
It will be successful now :).
This will not work if the dict has tuples.
@jcopps, if the
dict
orset
types have thetuples
, it should add some customized code snippets about traversing everydict
to check thetuple
position.
Then convert them tolist
type before using JSON dumps.For example, I assume that the following
record
is one ofset
in pickle file:record = {(1,2,3,3), (1,2,3,4)} type(record) # setTrying to use
json.dumps
to convert them toJSON
, and it will throw following error:TypeError: {(1, 2, 3, 3), (1, 2, 3, 4)} is not JSON serializable
io = StringIO() json_string = json.dump(record, io, ensure_ascii=False, sort_keys=True, indent=0)To fix that, it will do following code snippets firstly:
record = list(record) # [(1, 2, 3, 3), (1, 2, 3, 4)] record_index=0 while record_index < len(record): record[record_index] = list(record[record_index]) record_index += 1 print(record) # [[1, 2, 3, 3], [1, 2, 3, 4]]Then using
json.dumps
again:io = StringIO() json_string = json.dump(record, io, ensure_ascii=False, sort_keys=True, indent=0) print(io.getvalue()) """ [ [ 1, 2, 3, 3 ], [ 1, 2, 3, 4 ] ] """It will be successful now :).
Yes. I agree on that. But the JSON is no more reversible back to the way dictionary was.
Or numpy arrays