The dataset consists of a huge directory of json files
- each file contains a snapshot of https://api.nextbike.net/maps/nextbike-live.json?countries=de
- filenames are ISO8601 timestamps, except the
:
are replaced with_
My goal was to create timeseries for the amount of available and booked bikes in Karlsruhe.
DuckDB (https://duckdb.org/) was used to convert data from the list of JSON files to a single csv file. Running the following query on ~60GB of files on a USB flash drive took about 5 minutes on a 2021 MacBook pro (M1 Pro CPU, 32 GB RAM)
COPY (
SELECT
REPLACE(REPLACE(REPLACE(filename, '/Volumes/64GB/latest_63gb/', ''), '.json', ''), '_', ':') as timestamp,
country.domain as country_domain,
country.name as country_name,
country.available_bikes as country_available_bikes,
country.booked_bikes as country_booked_bikes,
SUM(CASE WHEN city.name = 'Karlsruhe' THEN city.available_bikes ELSE NULL END) AS karlsruhe_available_bikes,
SUM(CASE WHEN city.name = 'Karlsruhe' THEN city.booked_bikes ELSE NULL END) AS karlsruhe_booked_bikes,
SUM(CASE WHEN city.name = 'Rastatt' THEN city.available_bikes ELSE NULL END) AS rastatt_available_bikes,
SUM(CASE WHEN city.name = 'Rastatt' THEN city.booked_bikes ELSE NULL END) AS rastatt_booked_bikes,
SUM(CASE WHEN city.name = 'Gaggenau' THEN city.available_bikes ELSE NULL END) AS gaggenau_available_bikes,
SUM(CASE WHEN city.name = 'Gaggenau' THEN city.booked_bikes ELSE NULL END) AS gaggenau_booked_bikes,
SUM(CASE WHEN city.name = 'Baden-Baden' THEN city.available_bikes ELSE NULL END) AS baden_baden_available_bikes,
SUM(CASE WHEN city.name = 'Baden-Baden' THEN city.booked_bikes ELSE NULL END) AS baden_baden_booked_bikes,
SUM(CASE WHEN city.name = 'Bruchsal' THEN city.available_bikes ELSE NULL END) AS bruchsal_available_bikes,
SUM(CASE WHEN city.name = 'Bruchsal' THEN city.booked_bikes ELSE NULL END) AS bruchsal_booked_bikes,
SUM(CASE WHEN city.name = 'Rheinstetten' THEN city.available_bikes ELSE NULL END) AS rheinstetten_available_bikes,
SUM(CASE WHEN city.name = 'Rheinstetten' THEN city.booked_bikes ELSE NULL END) AS rheinstetten_booked_bikes,
SUM(CASE WHEN city.name = 'Ettlingen' THEN city.available_bikes ELSE NULL END) AS ettlingen_available_bikes,
SUM(CASE WHEN city.name = 'Ettlingen' THEN city.booked_bikes ELSE NULL END) AS ettlingen_booked_bikes
FROM (
SELECT
filename,
country,
unnest(country.cities) AS city
FROM (
SELECT
filename,
unnest(data.countries) AS country
FROM read_json('/Volumes/64GB/latest_63gb/2025-03*.json', filename=true) AS data
)
)
WHERE country.domain = 'fg'
GROUP BY country, city, filename
)
TO 'output.csv' WITH (FORMAT CSV, HEADER TRUE);
csv data was then visualized using Grafana and Infinity data source plugin (https://grafana.com/docs/plugins/yesoreyeram-infinity-datasource/latest/)
here's a screenshot of the grafana dashboard