Skip to content

Instantly share code, notes, and snippets.

View stephanGarland's full-sized avatar

Stephan Garland stephanGarland

View GitHub Profile
@stephanGarland
stephanGarland / tree_insert.sql
Created February 6, 2024 22:13
Example of a tree-like structure in SQL (specifically Postgres)
SET search_path TO 'example';
INSERT INTO genre (name) VALUES ('Rock'), ('Classic Rock'), ('Metal'), ('Progressive Metal'), ('Arena Rock'), ('Alternative Metal'), ('Grunge');
INSERT INTO category (name) VALUES ('Book'), ('Music');
INSERT INTO format (name) VALUES ('Compact Disc'), ('Vinyl');
INSERT INTO band (year_formed, name) VALUES (1985, 'Majesty'), (1988, 'Dream Theater'), (1990, 'Tool'), (1970, 'Queen'), (1987, 'Alice in Chains');
INSERT INTO artist (first_name, last_name) VALUES ('John', 'Petrucci'), ('John', 'Myung'), ('James', 'LaBrie'), ('Jordan', 'Ruddess'), ('Mike', 'Portnoy'), ('Mike', 'Mangini');
INSERT INTO artist (first_name, last_name, prefix, suffix) VALUES ('Brian', 'May', 'Sir', 'CBE');
@stephanGarland
stephanGarland / calc_innodb_buf_pool.py
Last active May 8, 2025 13:53
Designed for use with Ansible, but works on its own as well – calculate the tricky parts of InnoDB's buffer pool
#!/usr/bin/env python3
# LICENSE
# This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0.
# If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
# Copyright 2024 Stephan Garland
"""
Calculates various parameters for the InnoDB buffer pool based on a simple input.
@stephanGarland
stephanGarland / mysql_binary_truncation.sql
Created December 23, 2023 00:35
Testing silent truncation of zeros in BINARY columns in MySQL
mysql> CREATE TABLE foo(id TINYINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, bar BINARY(16), baz CHAR(36));
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9-a129-11ee-95fb-0242ac110000');
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9a12911ee95fb0242ac110000');
Query OK, 1 row affected (0.01 sec)
mysql> SHOW BINARY LOGS;
@stephanGarland
stephanGarland / indexing_bools_low_cardinality.md
Created October 5, 2023 11:40
Demonstrating how indexing booleans is often not helpful

System Specifications

  • OS: Debian Bullseye 5.10.0-23-amd64
  • Virtualized: Yes (Proxmox)
  • CPU: E5-2650 v2 @ 2.60GHz
  • Allocated Core Count: 16
  • Allocated RAM: 64 GiB PC3-12800R
  • Disk: Samsung PM983 1.92 TiB via Ceph
  • Filesystem: XFS
  • Mount Options: defaults,noatime
  • Postgres Version: 15.3
@stephanGarland
stephanGarland / bench_mysql_innodb_myisam.py
Last active September 30, 2023 23:31
Benchmarking MySQL INSERTs into InnoDB and MyISAM
import csv
import json
import os
import pymysql
import statistics
import time
def prepare_values(row):
new_row = []
@stephanGarland
stephanGarland / bench_mysql_innodb_myisam.sh
Created September 30, 2023 18:06
Benchmarking MySQL bulk load into InnoDB and MyISAM
#!/usr/bin/env bash
function create_tables() {
tbl_1=$(cat <<EOF
CREATE TABLE IF NOT EXISTS test_innodb (
user_id CHAR(32) NOT NULL PRIMARY KEY,
user_email VARCHAR(254) NOT NULL,
created_at DATETIME NOT NULL,
tags JSON NOT NULL DEFAULT ('{}'),
shared_with JSON NOT NULL DEFAULT ('{}'),
@stephanGarland
stephanGarland / mkjup.sh
Last active July 1, 2023 01:39
A zsh function to create an ephemeral or persistent venv and and Jupyter Notebook kernel.
# Requirements:
# jupyter-notebook: https://jupyter.org/install
# Python3: https://www.python.org/downloads
# virtualenv: https://pypi.org/project/virtualenv
# virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
# zsh
# Tested with Jupyter Notebook 6.5.3, Python 3.11.4, and zsh 5.8
mkjup() {
# This is a base64-encoded skeleton of an .ipynb file, with two variables which are filled via envsubst later

I ran some experiments with varying recordsizes, filesizes, and compression. The files were .csv, representative of a simple schema:

full_name,external_id,last_modified
'Past, Gabrielle',40605,'2006-07-09 23:17:20' 
'Vachil, Corry',44277,'1996-09-05 05:12:44'

The files were all generated on an ext4 filesystem. There were three sets of five files, with 75, 100,000, and 1,000,000 rows each, resulting in the following sizes:

❯ find . -name '*small*.csv' -exec du -bc {} + | \

awk 'END {printf "%s %.2f %s\n", "Average file size:", ($1 / (NR-1) / 1024), "KiB"}'

@stephanGarland
stephanGarland / zfs_file_life.txt
Last active April 1, 2023 15:30
Examining ZFS storage of files and snapshots with ZDB
# create a simple dataset with a small record size, and chown it
❯ sudo zfs create -o recordsize=512 tank/foobar && sudo chown $YOUR_USER:$YOUR_GROUP tank/foobar
❯ cd tank/foobar
# make a 1K file filled with hex FF (pull from /dev/zero, then use tr to translate to FF, which is 377 in octal)
# if it's just zeros, there isn't much to look at with zdb
❯ dd if=/dev/zero bs=1k count=1 | tr "\000" "\377" >file.txt
1+0 records in
1+0 records out
@stephanGarland
stephanGarland / mysql_1E6_rows_sp
Last active January 6, 2023 22:08
MySQL stored procedure to generate 1E6 rows from another table - horrible performance, would not recommend
DELIMITER // -- This is needed so that the individual commands don't end the stored procedure
CREATE PROCEDURE insert_zaps(IN num_rows int, IN pct_shared float) -- Two input args are needed
BEGIN
DECLARE loop_count bigint; -- Variables are initialized with a type
DECLARE len_table bigint;
DECLARE rand_base float;
DECLARE rand_offset float;
DECLARE rand_ts timestamp;
DECLARE rand_user bigint;
DECLARE shared_with_user bigint;