The dataset consists of a huge directory of json files
- each file contains a snapshot of https://api.nextbike.net/maps/nextbike-live.json?countries=de
- filenames are ISO8601 timestamps, except the
:
are replaced with_
My goal was to create timeseries for the amount of available and booked bikes in Karlsruhe.
DuckDB (https://duckdb.org/) was used to convert data from the list of JSON files to a single csv file. Running the following query on ~60GB of files on a USB flash drive took about 5 minutes on a 2021 MacBook pro (M1 Pro CPU, 32 GB RAM)