TSV means "tab-separated values". I prefer this format over CSV ("comma-separated values") because it doesn't require as much quoting. Many programs that can use CSV formatted data can also use TSV, although they may need to be explicitly told of the different format if it's not detected automatically.
However, in any of the jq scripts below, "@tsv
" can usually be replaced with "@csv
" to get CSV output instead.
-
jq --slurp --raw-output '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]|tostring])[] | @tsv' input.jsonl > output.tsv
Problems with nested structures are avoided by turning them all to strings with "
tostring
". -
jq --slurp --raw-output '(map(keys) | add | unique) as $keys | $keys, map([.[ $keys[] ]|tostring])[] | @tsv'
This version builds a list of all keys from every row, which allows for variations in keys from row to row.
-
The problem with the previous two is that they are shallow. They only work with the immediate child keys of each row. If the row contains nested structures (arrays or objects), they are expressed as strings.
To allow working with nested structures, this filter is an attempt to make an array of all unique paths to keys in the input:
jq --slurp --raw-output '[paths(scalars)[1:] | map(if type=="number" then "[\(.)]" else ".\(.)" end) | join("")] | unique'
Maybe combining this with one of the others will let us produce TSV with column headings for all keys, no matter how complex the JSON?
-
Like the parent of this item, except the paths are kept in the internal array structure, not converted to a selector string:
jq --slurp --compact-output --raw-output '[paths(scalars)[1:] | tojson] | unique[] | fromjson'
-
Using the path arrays from the previous item, get the value found at that path:
jq --slurp --raw-output '[["tar", 1, "max" ], ["tar", 2]] as $p | [($p[]|map(if type=="number" then "[\(.)]" else ".\(.)" end) | join(""))],(.[] | [getpath($p[])]) | @tsv'
https://jqplay.org/s/Qk1TTRjqJd
The values in the "
$p
" array are examples from the output of the previous item.
-
This is close:
jq --slurp --raw-output '[[paths(scalars)[1:] | tojson] | unique[] | fromjson] as $p | [($p[]|map(if type=="number" then "[\(.)]" else ".\(.)" end) | join(""))],(.[] | [getpath($p[])]) | @tsv'
https://jqplay.org/s/rxyRtfjwqA
In the example above, fu
is a number in most rows, but fu.x
exists in the last row. When the rows are processed, the
try…catch
block prevents errors about fu.x
on rows where fu
is a number, but no other output appears for the row after
that column, either. Also, output for the row containing fu.x
also contains the value {"x":42}
for the column fu
.
Maybe two things are needed:
- When a value is not a simple type (is that
scalar
?), it should not give any output. - When a row doesn't contain a nested key, there needs to be some way to process the remaining columns. Perhaps that
can be accomplished by calling
getpath()
multiple times per row, rather than one time with all path names.