Skip to content

Instantly share code, notes, and snippets.

@phibo23
Last active March 15, 2025 20:46
Show Gist options
  • Save phibo23/cf20518ebfb3c95010e131d52244f27c to your computer and use it in GitHub Desktop.
Save phibo23/cf20518ebfb3c95010e131d52244f27c to your computer and use it in GitHub Desktop.
ODD2025 Nextbike

Analyzing Nextbike Data

The dataset consists of a huge directory of json files

My goal was to create timeseries for the amount of available and booked bikes in Karlsruhe.

DuckDB (https://duckdb.org/) was used to convert data from the list of JSON files to a single csv file. Running the following query on ~60GB of files on a USB flash drive took about 5 minutes on a 2021 MacBook pro (M1 Pro CPU, 32 GB RAM)

COPY (
  SELECT
    REPLACE(REPLACE(REPLACE(filename, '/Volumes/64GB/latest_63gb/', ''), '.json', ''), '_', ':') as timestamp, 
    country.domain as country_domain,
    country.name as country_name,
    country.available_bikes as country_available_bikes,
    country.booked_bikes as country_booked_bikes,
    SUM(CASE WHEN city.name = 'Karlsruhe' THEN city.available_bikes ELSE NULL END) AS karlsruhe_available_bikes,
    SUM(CASE WHEN city.name = 'Karlsruhe' THEN city.booked_bikes ELSE NULL END) AS karlsruhe_booked_bikes,
    SUM(CASE WHEN city.name = 'Rastatt' THEN city.available_bikes ELSE NULL END) AS rastatt_available_bikes,
    SUM(CASE WHEN city.name = 'Rastatt' THEN city.booked_bikes ELSE NULL END) AS rastatt_booked_bikes,
    SUM(CASE WHEN city.name = 'Gaggenau' THEN city.available_bikes ELSE NULL END) AS gaggenau_available_bikes,
    SUM(CASE WHEN city.name = 'Gaggenau' THEN city.booked_bikes ELSE NULL END) AS gaggenau_booked_bikes,
    SUM(CASE WHEN city.name = 'Baden-Baden' THEN city.available_bikes ELSE NULL END) AS baden_baden_available_bikes,
    SUM(CASE WHEN city.name = 'Baden-Baden' THEN city.booked_bikes ELSE NULL END) AS baden_baden_booked_bikes,
    SUM(CASE WHEN city.name = 'Bruchsal' THEN city.available_bikes ELSE NULL END) AS bruchsal_available_bikes,
    SUM(CASE WHEN city.name = 'Bruchsal' THEN city.booked_bikes ELSE NULL END) AS bruchsal_booked_bikes,
    SUM(CASE WHEN city.name = 'Rheinstetten' THEN city.available_bikes ELSE NULL END) AS rheinstetten_available_bikes,
    SUM(CASE WHEN city.name = 'Rheinstetten' THEN city.booked_bikes ELSE NULL END) AS rheinstetten_booked_bikes,
    SUM(CASE WHEN city.name = 'Ettlingen' THEN city.available_bikes ELSE NULL END) AS ettlingen_available_bikes,
    SUM(CASE WHEN city.name = 'Ettlingen' THEN city.booked_bikes ELSE NULL END) AS ettlingen_booked_bikes
  FROM (
    SELECT
      filename,
      country,
      unnest(country.cities) AS city
    FROM (
      SELECT
        filename,
        unnest(data.countries) AS country
      FROM read_json('/Volumes/64GB/latest_63gb/2025-03*.json', filename=true) AS data
    )
  )
  WHERE country.domain = 'fg'
  GROUP BY country, city, filename
)
TO 'output.csv' WITH (FORMAT CSV, HEADER TRUE);

csv data was then visualized using Grafana and Infinity data source plugin (https://grafana.com/docs/plugins/yesoreyeram-infinity-datasource/latest/)

@phibo23
Copy link
Author

phibo23 commented Mar 15, 2025

here's a screenshot of the grafana dashboard

Bildschirmfoto 2025-03-15 um 20 05 55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment