Skip to content

Instantly share code, notes, and snippets.

@mvexel
Last active November 7, 2025 17:42
Show Gist options
  • Select an option

  • Save mvexel/fbd6e20b5edaaaddf30480c216f82039 to your computer and use it in GitHub Desktop.

Select an option

Save mvexel/fbd6e20b5edaaaddf30480c216f82039 to your computer and use it in GitHub Desktop.
overture places counts bucketed by confidence
D SELECT
confidence_bucket,
place_count,
ROUND(100.0 * place_count / SUM(place_count) OVER (), 2) AS percentage,
ROUND(100.0 * SUM(place_count) OVER (ORDER BY confidence_bucket DESC) / SUM(place_count) OVER (), 2) AS pct_at_or_above
FROM (
SELECT
FLOOR(confidence * 10) / 10 AS confidence_bucket,
COUNT(*) AS place_count
FROM read_parquet('s3://overturemaps-us-west-2/release/2025-10-22.0/theme=places/type=place/*')
WHERE confidence IS NOT NULL
AND addresses[1].country = 'US'
GROUP BY confidence_bucket
)
ORDER BY confidence_bucket DESC;
┌───────────────────┬─────────────┬────────────┬─────────────────┐
│ confidence_bucket │ place_count │ percentage │ pct_at_or_above │
│ double │ int64 │ double │ double │
├───────────────────┼─────────────┼────────────┼─────────────────┤
│ 1.0 │ 709 │ 0.0 │ 0.0 │
│ 0.9 │ 8141037 │ 53.55 │ 53.55 │
│ 0.8 │ 150010 │ 0.99 │ 54.54 │
│ 0.7 │ 3889141 │ 25.58 │ 80.12 │
│ 0.6 │ 1135877 │ 7.47 │ 87.59 │
│ 0.5 │ 276980 │ 1.82 │ 89.41 │
│ 0.4 │ 2453 │ 0.02 │ 89.43 │
│ 0.3 │ 1599692 │ 10.52 │ 99.95 │
│ 0.2 │ 2475 │ 0.02 │ 99.97 │
│ 0.1 │ 3022 │ 0.02 │ 99.99 │
│ 0.0 │ 1917 │ 0.01 │ 100.0 │
├───────────────────┴─────────────┴────────────┴─────────────────┤
│ 11 rows 4 columns │
└────────────────────────────────────────────────────────────────┘
@mvexel
Copy link
Author

mvexel commented Nov 6, 2025

For world:

D SELECT
      confidence_bucket,
      place_count,
      ROUND(100.0 * place_count / SUM(place_count) OVER (), 2) AS percentage,
      ROUND(100.0 * SUM(place_count) OVER (ORDER BY confidence_bucket DESC) / SUM(place_count) OVER (), 2) AS pct_at_or_above
  FROM (
      SELECT
          FLOOR(confidence * 10) / 10 AS confidence_bucket,
          COUNT(*) AS place_count
      FROM read_parquet('s3://overturemaps-us-west-2/release/2025-10-22.0/theme=places/type=place/*')
      WHERE confidence IS NOT NULL
      GROUP BY confidence_bucket
  )
  ORDER BY confidence_bucket DESC;
┌───────────────────┬─────────────┬────────────┬─────────────────┐
│ confidence_bucket │ place_count │ percentage │ pct_at_or_above │
│      double       │    int64    │   double   │     double      │
├───────────────────┼─────────────┼────────────┼─────────────────┤
│               1.0 │      122146 │       0.17 │            0.17 │
│               0.9 │    37047953 │      51.81 │           51.98 │
│               0.8 │      328378 │       0.46 │           52.44 │
│               0.7 │    10750893 │      15.04 │           67.48 │
│               0.6 │     9314607 │      13.03 │           80.51 │
│               0.5 │     2602241 │       3.64 │           84.15 │
│               0.4 │        2454 │        0.0 │           84.15 │
│               0.3 │    11323110 │      15.84 │           99.99 │
│               0.2 │        2476 │        0.0 │           99.99 │
│               0.1 │        3022 │        0.0 │           99.99 │
│               0.0 │        4500 │       0.01 │           100.0 │
├───────────────────┴─────────────┴────────────┴─────────────────┤
│ 11 rows                                              4 columns │
└────────────────────────────────────────────────────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment