Stephan Garland stephanGarland

Setup

Hardware

M4 Pro MBP w/ 48 GiB RAM

Software

Postgres 17, shared_buffers=2GB, effective_cache_size=8GB

Prerequisites

Mac

This has been tested only on ARM (Apple Silicon) Macs. YMMV on x86. I don't see why it wouldn't work, but you may need to change some directory paths.

MySQL

You'll need to have MySQL 5.7 and 8 installed. The former is somewhat tricky, because the

Environment

System

Macbook Pro M3
RAM: 36 GiB
SSD: 500 GB

MySQL

Results

All programs compiled with -O3, using clang v15.0.0 on MacOS, and gcc 10.2.1 on Linux. Mac was an M1 Air base model (8 GiB RAM), with low power mode disabled. Linux server was a Debian 11 VM with a Xeon E5-2650 v2 @ 2.60GHz, and 64 GiB of PC3-12800R RAM.

Linux

❯ ./uuid_runner.sh

Environment

System

OS: Debian 12 VM on Proxmox VE 8
CPUs: 8x Xeon E5-2620 v2
RAM: 8 GiB DDR3
HDD: Samsung PM863 NVMe in Ceph over Mellanox Infiniband 56 Gbps IPoIB
- Virtualized via scsi-virtio-single, aio=native, io_threads=enabled
FS: ext4 (rw,noatime)

Introduction

MySQL's documentation states:

When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.

The question was asked, how is this deterministic / accurate? My hypothesis is that it's due to MySQL's use of a clustering index, as opposed to Postgres' heap storage. I think that, given a monotonic PK such as an AUTO_INCREMENT (or perhaps any index), it's able to use that to guarantee determinism.

This table has 1,000,000 rows, consisting of a UUIDv4 PK, a random int of range (1,1000000), and ~1 KiB of Lorem Ipsum text.

postgres=# \d+ uuid_pk
                                          Table "public.uuid_pk"
 Column  |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
---------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 id      | uuid    |           | not null |         | plain    |             |              |
 user_id | integer |           | not null |         | plain    |             |              |
 lorem   | text    |           | not null |         | extended |             |              |

	import gzip
	import hashlib
	import sqlite3
	import urllib.request
	from dataclasses import asdict, dataclass


	@dataclass
	class Queries:
	insert_program: str = (

	import configparser
	import plistlib
	import shutil
	import subprocess
	from enum import Enum
	from pathlib import Path
	from string import Template
	from typing import NamedTuple, Optional

	EXEC_START_TMPL = Template("/opt/homebrew/opt/mysql@${mysql_ver}/bin/mysqld_safe")

	import itertools
	import uuid
	from functools import lru_cache
	from typing import Dict, List, Optional, Union

	from sqlalchemy import (Column, DateTime, ForeignKey, Integer, String,
	create_engine)
	from sqlalchemy.dialects import mysql
	from sqlalchemy.orm import (Session, declarative_base, relationship,
	sessionmaker)