Skip to content

Instantly share code, notes, and snippets.

@notionparallax
Last active July 2, 2020 00:51
Show Gist options
  • Save notionparallax/6f63783d9cb008eac8bf98f68af254e4 to your computer and use it in GitHub Desktop.
Save notionparallax/6f63783d9cb008eac8bf98f68af254e4 to your computer and use it in GitHub Desktop.
This is how I'd read a folder full of json files and turn them into a dataframe. There's probably a more efficient way that Pandas can do, but this gives a good level of fine control.
{
"name": "A",
"chats": [
{
"dt":"2020-07-02 10:32:54.876422",
"from":"me",
"message":"yo"
},{
"dt":"2020-07-02 10:35:54.876422",
"from":"them",
"message":"πŸ†β‰"
},{
"dt":"2020-07-02 10:36:54.876422",
"from":"me",
"message":"no"
}
]
}
{
"name": "B",
"chats": [
{
"dt":"2020-07-02 10:32:54.876422",
"from":"me",
"message":"yo"
},{
"dt":"2020-07-02 10:35:54.876422",
"from":"them",
"message":"🍳πŸ₯ͺ"
},{
"dt":"2020-07-02 10:36:54.876422",
"from":"me",
"message":"πŸ‘ yes πŸ‘"
}
]
}
import os
import json
import pandas as pd
dataframes_of_chats = []
for json_file in os.listdir("."):
print(json_file)
if ".json" in json_file:
the_file_object = open(json_file, "r", encoding="utf-8")
file_contents = json.load(the_file_object)
print(file_contents)
messages = file_contents.get("chats")
correspondent = file_contents.get("name")
chats_with_x = pd.DataFrame(messages)
chats_with_x["correspondent"] = correspondent
dataframes_of_chats.append(chats_with_x)
all_the_chats = pd.concat(dataframes_of_chats)
print(all_the_chats)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment