Skip to content

Instantly share code, notes, and snippets.

@eiri
Created February 28, 2020 17:31
Show Gist options
  • Save eiri/be74e6803f201ac4ea06bf0b39378da1 to your computer and use it in GitHub Desktop.
Save eiri/be74e6803f201ac4ea06bf0b39378da1 to your computer and use it in GitHub Desktop.
Slash and transform json file with jq

Grab a data file. "American movies scraped from Wikipedia" is a nice condence set at size 3.4M wget https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json -O movies.json

Slice movies made from 1920 till 1930 and output them line by line

jq -cr '.[] | select(.year >= 1920 and .year <= 1930)' movies.json

Pass the output to reducer to group by year, calculate total movies per year and accumulate movies into "movies" array in each block

jq -cn 'reduce inputs as $line ({}; $line.year as $year | .[($year | tostring)].total as $total | .[($year | tostring)].movies as $movies | . + {($year | tostring): {"total": (1 + $total), "movies": ($movies + [{"title": $line.title, "cast": $line.cast, "genres": $line.genres}])}})'

What's going on here is that -n attribute of jq turns on inputs so we don't have to slurp output into memory, then we are setting accumulator to {}, defining temp vars .["\($year)"].total and .[($year | tostring)].movies on that accumulator (" wrap around \($year) and | tostring are doing the same thing and important here, becase year is a number and object keys have to be strings) and then composing final shape of our array, merging it into accumulator with . + {($year... part and merging all subkeys.

#!/bin/bash
start=${2:-1900}
end=${3:-1905}
jq -cr ".[] | select(.year >= ${start} and .year <= ${end})" $1 | jq -cn 'reduce inputs as $line ({}; $line.year as $year | .["\($year)"].total as $total | .[($year | tostring)].movies as $movies | . + {($year | tostring): {"total": (1 + $total), "movies": ($movies + [{"title": $line.title, "cast": $line.cast, "genres": $line.genres}])}})' > movies-${start}-${end}.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment