Skip to content

Instantly share code, notes, and snippets.

View stephanGarland's full-sized avatar

Stephan Garland stephanGarland

View GitHub Profile
@stephanGarland
stephanGarland / mysql_binary_truncation.sql
Created December 23, 2023 00:35
Testing silent truncation of zeros in BINARY columns in MySQL
mysql> CREATE TABLE foo(id TINYINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, bar BINARY(16), baz CHAR(36));
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9-a129-11ee-95fb-0242ac110000');
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9a12911ee95fb0242ac110000');
Query OK, 1 row affected (0.01 sec)
mysql> SHOW BINARY LOGS;
@stephanGarland
stephanGarland / indexing_bools_low_cardinality.md
Created October 5, 2023 11:40
Demonstrating how indexing booleans is often not helpful

System Specifications

  • OS: Debian Bullseye 5.10.0-23-amd64
  • Virtualized: Yes (Proxmox)
  • CPU: E5-2650 v2 @ 2.60GHz
  • Allocated Core Count: 16
  • Allocated RAM: 64 GiB PC3-12800R
  • Disk: Samsung PM983 1.92 TiB via Ceph
  • Filesystem: XFS
  • Mount Options: defaults,noatime
  • Postgres Version: 15.3
@stephanGarland
stephanGarland / bench_mysql_innodb_myisam.py
Last active September 30, 2023 23:31
Benchmarking MySQL INSERTs into InnoDB and MyISAM
import csv
import json
import os
import pymysql
import statistics
import time
def prepare_values(row):
new_row = []
@stephanGarland
stephanGarland / bench_mysql_innodb_myisam.sh
Created September 30, 2023 18:06
Benchmarking MySQL bulk load into InnoDB and MyISAM
#!/usr/bin/env bash
function create_tables() {
tbl_1=$(cat <<EOF
CREATE TABLE IF NOT EXISTS test_innodb (
user_id CHAR(32) NOT NULL PRIMARY KEY,
user_email VARCHAR(254) NOT NULL,
created_at DATETIME NOT NULL,
tags JSON NOT NULL DEFAULT ('{}'),
shared_with JSON NOT NULL DEFAULT ('{}'),
@stephanGarland
stephanGarland / mkjup.sh
Last active July 1, 2023 01:39
A zsh function to create an ephemeral or persistent venv and and Jupyter Notebook kernel.
# Requirements:
# jupyter-notebook: https://jupyter.org/install
# Python3: https://www.python.org/downloads
# virtualenv: https://pypi.org/project/virtualenv
# virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
# zsh
# Tested with Jupyter Notebook 6.5.3, Python 3.11.4, and zsh 5.8
mkjup() {
# This is a base64-encoded skeleton of an .ipynb file, with two variables which are filled via envsubst later

I ran some experiments with varying recordsizes, filesizes, and compression. The files were .csv, representative of a simple schema:

full_name,external_id,last_modified
'Past, Gabrielle',40605,'2006-07-09 23:17:20' 
'Vachil, Corry',44277,'1996-09-05 05:12:44'

The files were all generated on an ext4 filesystem. There were three sets of five files, with 75, 100,000, and 1,000,000 rows each, resulting in the following sizes:

❯ find . -name '*small*.csv' -exec du -bc {} + | \

awk 'END {printf "%s %.2f %s\n", "Average file size:", ($1 / (NR-1) / 1024), "KiB"}'

@stephanGarland
stephanGarland / zfs_file_life.txt
Last active April 1, 2023 15:30
Examining ZFS storage of files and snapshots with ZDB
# create a simple dataset with a small record size, and chown it
❯ sudo zfs create -o recordsize=512 tank/foobar && sudo chown $YOUR_USER:$YOUR_GROUP tank/foobar
❯ cd tank/foobar
# make a 1K file filled with hex FF (pull from /dev/zero, then use tr to translate to FF, which is 377 in octal)
# if it's just zeros, there isn't much to look at with zdb
❯ dd if=/dev/zero bs=1k count=1 | tr "\000" "\377" >file.txt
1+0 records in
1+0 records out
@stephanGarland
stephanGarland / mysql_1E6_rows_sp
Last active January 6, 2023 22:08
MySQL stored procedure to generate 1E6 rows from another table - horrible performance, would not recommend
DELIMITER // -- This is needed so that the individual commands don't end the stored procedure
CREATE PROCEDURE insert_zaps(IN num_rows int, IN pct_shared float) -- Two input args are needed
BEGIN
DECLARE loop_count bigint; -- Variables are initialized with a type
DECLARE len_table bigint;
DECLARE rand_base float;
DECLARE rand_offset float;
DECLARE rand_ts timestamp;
DECLARE rand_user bigint;
DECLARE shared_with_user bigint;
@stephanGarland
stephanGarland / gen_sql_example.py
Last active January 6, 2023 23:04
Example on rapidly generating a file that can be loaded into a SQL database - here, generating random names from two files, and a monotonically increasing integer column
import argparse
from collections import deque
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, wait
from math import floor
from multiprocessing import Value
from os import urandom
import random
class Allocator:
def __init__(self, id_max: int):
@stephanGarland
stephanGarland / mysql_8_0_23_investigation.md
Created November 19, 2022 21:51
Investigating why doing a SELECT COUNT(*) followed by a TRUNCATE or DROP is far faster than those DDL operations alone

Examining MySQL 8.0.23 with strace during SELECT COUNT(*), TRUNCATE TABLE, and DROP TABLE operations

Notes

  • Ran on a Proxmox VM on a Dell R620 with dual E5-2650 v2, all 32 cores given to the VM, and 64 GiB of RAM allocated
  • Disk was an ext4 filesystem on an HDD-based virtual disk, presented via a ZFS array in another physical server
  • Captured with strace -T -ff -o /usr/local/mysql/trace /usr/local/mysql/bin/mysqld

Tests

This is running a SELECT COUNT(*), then a TRUNCATE, then a DROP, with FOREIGN_KEY_CHECKS=OFF