M4 Pro MBP w/ 48 GiB RAM
Postgres 17, shared_buffers=2GB, effective_cache_size=8GB
| import gzip | |
| import hashlib | |
| import sqlite3 | |
| import urllib.request | |
| from dataclasses import asdict, dataclass | |
| @dataclass | |
| class Queries: | |
| insert_program: str = ( |
| import configparser | |
| import plistlib | |
| import shutil | |
| import subprocess | |
| from enum import Enum | |
| from pathlib import Path | |
| from string import Template | |
| from typing import NamedTuple, Optional | |
| EXEC_START_TMPL = Template("/opt/homebrew/opt/mysql@${mysql_ver}/bin/mysqld_safe") |
| import itertools | |
| import uuid | |
| from functools import lru_cache | |
| from typing import Dict, List, Optional, Union | |
| from sqlalchemy import (Column, DateTime, ForeignKey, Integer, String, | |
| create_engine) | |
| from sqlalchemy.dialects import mysql | |
| from sqlalchemy.orm import (Session, declarative_base, relationship, | |
| sessionmaker) |
MySQL's documentation states:
When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.
The question was asked, how is this deterministic / accurate? My hypothesis is that it's due to MySQL's use of a clustering
index, as opposed to Postgres' heap storage. I think that, given a monotonic PK such as an AUTO_INCREMENT (or perhaps
any index), it's able to use that to guarantee determinism.
This table has 1,000,000 rows, consisting of a UUIDv4 PK, a random int of range (1,1000000), and ~1 KiB of Lorem Ipsum text.
postgres=# \d+ uuid_pk
Table "public.uuid_pk"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | uuid | | not null | | plain | | |
user_id | integer | | not null | | plain | | |
lorem | text | | not null | | extended | | |