Grab a data file. "American movies scraped from Wikipedia" is a nice condence set at size 3.4M
wget https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json -O movies.json
Slice movies made from 1920 till 1930 and output them line by line
jq -cr '.[] | select(.year >= 1920 and .year <= 1930)' movies.json
Pass the output to reducer to group by year, calculate total movies per year and accumulate movies into "movies" array in each block
jq -cn 'reduce inputs as $line ({}; $line.year as $year | .[($year | tostring)].total as $total | .[($year | tostring)].movies as $movies | . + {($year | tostring): {"total": (1 + $total), "movies": ($movies + [{"title": $line.title, "cast": $line.cast, "genres": $line.genres}])}})'
What's going on here is that -n
attribute of jq
turns on inputs
so we don't have to slurp output into memory,
then we are setting accumulator to {}
, defining temp vars .["\($year)"].total
and .[($year | tostring)].movies
on that accumulator ("
wrap around \($year)
and | tostring
are doing the same thing and important here,
becase year
is a number and object keys have to be strings) and then composing final shape of our array, merging it
into accumulator with . + {($year...
part and merging all subkeys.