This gist shows how to convert a nested JSON file to an R data.frame. To do this, it uses jsonlite and data.tree.
The gist contains two examples: one is a bit simpler, the second one a bit more advanced.
In the first example, we download all the repos from Hadley Wickham's Github account from https://api.github.com/users/hadley/repos . This JSON contains a nested owner object. The code shows how to convert that in a flat data.frame in three statements:
- line 5: download
- line 8: convert to data.tree
- line 12: convert to data.frame
The basic idea is as follows:
- convert the JSON to a list of lists of lists, using jsonlite, avoiding simplification
- convert the list of lists to a data.tree. This structure is very similar to the semantic meaning of the JSON
- flatten the tree structure, using the various features of the data.tree package.
The main function to use in step 3 is the $ToDataFrameTable, which (conceptually) does two things:
- it traverses the leaves of the tree
- it then converts each leaf to a row in the data.frame. In more detail: a. fields of a node are mapped to columns in the data.frame b. if a field is not available in a leaf node, then ancestors are searched
There is a few bells and whistles you can add to this. This is shown in the second example. It creates a data.frame containing all the contributors of repos for which hadley is the owner. We do this by extending the tree structure created in the above example: For each repo, we add a nested contributors node, such that the structure of our tree after executing line 36 will be:
root repo 1 owner contributors contributor 1 contributor 2 etc. repo 2 owner contributors etc.
Specifically, this example shows that:
- you can rename a field (see line 45)
- instead of mapping a field of a Node, you can execute a function (see line 47)
- you can filter which leaves you want to include in your data.frame (see line 54)
Hi sensejoin,
You need to run
reposdf <- repos %>% ToDataFrameTable(ownerId = "id", ...
instead of
reposdf <- repos$ToDataFrameTable(ownerId = "id", ...