Skip to content

Instantly share code, notes, and snippets.

@marklit
Created July 21, 2024 17:46
Show Gist options
  • Save marklit/3a0d57a0558a80cd387e5da7be4dce96 to your computer and use it in GitHub Desktop.
Save marklit/3a0d57a0558a80cd387e5da7be4dce96 to your computer and use it in GitHub Desktop.
Overture July 2024 Canadian Addresses

DuckDB v1.0.0 1f98600c2c

$ cd '/mnt/j/gis/Global/overture/2024_07/theme=addresses/type=address'
$ COUNTRY=CA
$ echo "COPY (
              SELECT h3_cell_to_boundary_wkt(
                          h3_latlng_to_cell(bbox.ymax, bbox.xmax, 5))::geometry geom,
                      COUNT(*)
              FROM read_parquet('*.parquet')
              WHERE country = '$COUNTRY'
              group by 1
          ) TO '../../addresses.$COUNTRY.gpkg'
              WITH (FORMAT GDAL,
                    DRIVER 'GPKG',
                    LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES')" | ~/duckdb
Floating point exception
@jwass
Copy link

jwass commented Jul 22, 2024

I also just ran this without any issue:

D install h3 from community;
D load h3;
D COPY (
                SELECT h3_cell_to_boundary_wkt(
                            h3_latlng_to_cell(bbox.ymax, bbox.xmax, 5))::geometry geom,
                        COUNT(*)
                FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=addresses/type=address/*.parquet')
                WHERE country = 'CA'
                group by 1
            ) TO 'addresses.ca.gpkg'
                WITH (FORMAT GDAL,
                      DRIVER 'GPKG',
                      LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
100% ▕████████████████████████████████████████████████████████████▏

@marklit
Copy link
Author

marklit commented Jul 22, 2024

Using S3 as a source works fine, it's just with a local file that the issue comes up. I'll raise a ticket with DuckDB. duckdb/duckdb-spatial#368

For the JSON issue, scan the whole dataset, it'll be one rogue JSON record. It's unlikely it's in the first 10 records.

@jwass
Copy link

jwass commented Jul 22, 2024

Looks like direct from S3 also resolves the JSON issue also. That's weird.

D SELECT   country,
           COUNT(*) AS num_addrs,
           sources->0->'dataset' AS source
  FROM     READ_PARQUET('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=addresses/type=address/*.parquet')
  WHERE country = 'US' GROUP BY 1, 3
  ORDER BY 1, 2 DESC;
100% ▕████████████████████████████████████████████████████████████▏
┌─────────┬───────────┬────────┐
│ country │ num_addrs │ source │
│ varchar │   int64   │  json  │
├─────────┼───────────┼────────┤
│ US      │  78078341 │ "NAD"  │
└─────────┴───────────┴────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment