Skip to content

Instantly share code, notes, and snippets.

View nyurik's full-sized avatar

Yuri Astrakhan nyurik

View GitHub Profile
@nyurik
nyurik / benches_iters.rs
Last active May 9, 2023 02:37
Add “iterate with separators” iterator function
// Benchmarks for the Rust iterator extension discussion at
// https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13
// Place this page as /benches/iters.rs in a rust project created with `cargo new itertest --lib`
// Add to Cargo.toml:
//
// [dependencies]
// itertools = "0.10"
//
@nyurik
nyurik / bench.rs
Created April 5, 2023 19:30
Benchmark to evaluate linear vs dup-indexer performance
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use dup_indexer::DupIndexer;
fn benchmark_strings(c: &mut Criterion) {
let mut group = c.benchmark_group("Strings");
group.bench_function("String", |b| {
b.iter(|| {
let mut di = DupIndexer::new();
for _ in 0..100 {
@nyurik
nyurik / query.sql
Created January 16, 2023 05:58
Statistics of MVT tile GIS errors when encoding/decoding with ST_AsMVTGeom
SELECT x,
y,
ST_Y(p_mid) mid_lat,
ST_Y(p_min) min_lat,
ST_Y(p_max) max_lat,
ST_Y(d_mid) mid_lat_decoded,
ST_Y(d_min) min_lat_decoded,
ST_Y(d_max) max_lat_decoded,
abs(ST_Y(p_mid) - ST_Y(d_mid)) mid_lat_error,
abs(ST_Y(p_min) - ST_Y(d_min)) min_lat_error,
@nyurik
nyurik / optimized-mbtiles.sql
Created June 10, 2022 19:25
Some ideas on optimizing mbtiles file storage size with a single 32-bit index instead of z/x/y
create table map
(
tile_index INTEGER,
tmp_zoom INTEGER GENERATED ALWAYS AS ((tile_index & 0xFC000000) >> 26) VIRTUAL,
tile_column INTEGER GENERATED ALWAYS AS (CASE
WHEN tmp_zoom <= 13 THEN (tile_index & 0x3FFE000) >> 13
ELSE (tile_index & 0x3FFF8000) >> 15 END) VIRTUAL,
tile_row INTEGER GENERATED ALWAYS AS (CASE
WHEN tmp_zoom <= 13 THEN tile_index & 0x1FFF
ELSE tile_index & 0x7FFF END) VIRTUAL,
@nyurik
nyurik / types.rs
Last active February 21, 2022 22:56
Multidimensional Geo-types with separate Metadata
use num_traits::{Float, Num, NumCast};
use std::fmt::Debug;
trait CoordinateType: Default + Num + Copy + NumCast + PartialOrd + Debug {}
impl<T: Default + Num + Copy + NumCast + PartialOrd + Debug> CoordinateType for T {}
trait CoordNum: CoordinateType {}
impl<T: CoordinateType + Debug> CoordNum for T {}
trait CoordFloat: CoordNum + Float {}
@nyurik
nyurik / types.rs
Last active February 21, 2022 16:21
Multidimensional Geo-types
use num_traits::{Float, Num, NumCast};
use std::fmt::Debug;
trait CoordinateType: Default + Num + Copy + NumCast + PartialOrd + Debug {}
impl<T: Default + Num + Copy + NumCast + PartialOrd + Debug> CoordinateType for T {}
trait CoordNum: CoordinateType {}
impl<T: CoordinateType + Debug> CoordNum for T {}
trait CoordFloat: CoordNum + Float {}
@nyurik
nyurik / denormalize_osm_data.md
Last active February 15, 2022 06:53
Convenient OSM data

OpenStreetMap data is heavily normalized, making it very hard to process. Modeled on a relational database, it seems to have missed the second part of the "Normalize until it hurts; denormalize until it works" proverb.

Each node has an ID, and every way and relation uses an ID to reference that node. This means that every data consumer must keep an enrmous cache of 8 billion node IDs and corresponding lat,lng pairs while processing input data. In most cases, node ID gets discarded right after parsing.

I would like to propose a new easy to process data strucutre, for both bulk downloads and streaming update use cases.

Target audience

  • YES -- Data consumers who transform OSM data into something else, i.e. tiles, shapes, analytical reports, etc.
@nyurik
nyurik / is_tf_in_pr.py
Created April 28, 2021 14:43
A script to detect when Terraform projects or depended modules are part of a GIT pull request change
#!/usr/bin/env python3
# A script to detect when Terraform projects or depended modules are part of a GIT pull request change
#
# Usage: python3 pr_tf_changes.py <branch> <dir>...
#
# <branch> GIT branch to compare using git diff branch... shell call
# <dir> One or more directories to monitor, including all sub-dirs, relative to repo's root
#
# Set DEBUG env var to see additional debugging information
# If match is found, exitcode is 0, otherwise 1
@nyurik
nyurik / to-logical.py
Last active May 10, 2024 02:39
Script to convert SCSS files from physical to logical values for RTL and vertical languages
#
# This script converts margins, padding, and borders to logical values,
# allowing RTL and vertical languages to show correctly.
# Supports both *.css and *.scss files.
#
# Some renames are not yet implemented widely, and may require CSS plugin
# https://github.com/csstools/postcss-logical
# They have been commented out for now, but feel free to try them out.
#
# Full spec: https://drafts.csswg.org/css-logical/
@nyurik
nyurik / OptimizeLabelGrid.sql
Last active May 19, 2020 22:23
Optimizing LabelGrid - the result is worse than before??
------- Testing:
-- git clone https://github.com/openmaptiles/openmaptiles
-- git checkout upgrade-v5-pg12
-- place this file in the openmaptiles/ dir as "test-func.sql"
-- use make start-db to create a new database (in docker)
-- use make bash to start tools (another docker)
-- test with this command. The test call is taken from the openmaptiles-tools/tests/sql/LabelGrid.sql test. Note the "volatile" keyword - without it the query planner will optimize away multiple calls with the same value.
-- profile-pg-func --file test-func.sql "LabelGrid_pgsql(ST_GeomFromText('POINT(100 -100)',900913), 64*9.5546285343)" "LabelGrid_sql(ST_GeomFromText('POINT(100 -100)',900913), 64*9.5546285343)"
-- The results are not that great: