Stephan Garland stephanGarland

System Specifications

OS: Debian Bullseye 5.10.0-23-amd64
Virtualized: Yes (Proxmox)
CPU: E5-2650 v2 @ 2.60GHz
Allocated Core Count: 16
Allocated RAM: 64 GiB PC3-12800R
Disk: Samsung PM983 1.92 TiB via Ceph
Filesystem: XFS
Mount Options: defaults,noatime
Postgres Version: 15.3

I ran some experiments with varying recordsizes, filesizes, and compression. The files were .csv, representative of a simple schema:

full_name,external_id,last_modified
'Past, Gabrielle',40605,'2006-07-09 23:17:20' 
'Vachil, Corry',44277,'1996-09-05 05:12:44'

The files were all generated on an ext4 filesystem. There were three sets of five files, with 75, 100,000, and 1,000,000 rows each, resulting in the following sizes:

❯ find . -name '*small*.csv' -exec du -bc {} + | \

awk 'END {printf "%s %.2f %s\n", "Average file size:", ($1 / (NR-1) / 1024), "KiB"}'

Examining MySQL 8.0.23 with strace during SELECT COUNT(*), TRUNCATE TABLE, and DROP TABLE operations

Notes

Ran on a Proxmox VM on a Dell R620 with dual E5-2650 v2, all 32 cores given to the VM, and 64 GiB of RAM allocated
Disk was an ext4 filesystem on an HDD-based virtual disk, presented via a ZFS array in another physical server
Captured with strace -T -ff -o /usr/local/mysql/trace /usr/local/mysql/bin/mysqld

	mysql> CREATE TABLE foo(id TINYINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, bar BINARY(16), baz CHAR(36));
	Query OK, 0 rows affected (0.03 sec)

	mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9-a129-11ee-95fb-0242ac110000');
	Query OK, 1 row affected (0.02 sec)

	mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9a12911ee95fb0242ac110000');
	Query OK, 1 row affected (0.01 sec)

	mysql> SHOW BINARY LOGS;

	import csv
	import json
	import os
	import pymysql
	import statistics
	import time


	def prepare_values(row):
	new_row = []

	#!/usr/bin/env bash

	function create_tables() {
	tbl_1=$(cat <<EOF
	CREATE TABLE IF NOT EXISTS test_innodb (
	user_id CHAR(32) NOT NULL PRIMARY KEY,
	user_email VARCHAR(254) NOT NULL,
	created_at DATETIME NOT NULL,
	tags JSON NOT NULL DEFAULT ('{}'),
	shared_with JSON NOT NULL DEFAULT ('{}'),

	# Requirements:
	# jupyter-notebook: https://jupyter.org/install
	# Python3: https://www.python.org/downloads
	# virtualenv: https://pypi.org/project/virtualenv
	# virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
	# zsh
	# Tested with Jupyter Notebook 6.5.3, Python 3.11.4, and zsh 5.8

	mkjup() {
	# This is a base64-encoded skeleton of an .ipynb file, with two variables which are filled via envsubst later

	# create a simple dataset with a small record size, and chown it

	❯ sudo zfs create -o recordsize=512 tank/foobar && sudo chown $YOUR_USER:$YOUR_GROUP tank/foobar
	❯ cd tank/foobar

	# make a 1K file filled with hex FF (pull from /dev/zero, then use tr to translate to FF, which is 377 in octal)
	# if it's just zeros, there isn't much to look at with zdb
	❯ dd if=/dev/zero bs=1k count=1 \| tr "\000" "\377" >file.txt
	1+0 records in
	1+0 records out

Stephan Garland stephanGarland

System Specifications

Examining MySQL 8.0.23 with strace during SELECT COUNT(*), TRUNCATE TABLE, and DROP TABLE operations

Notes

Tests

This is running a SELECT COUNT(*), then a TRUNCATE, then a DROP, with FOREIGN_KEY_CHECKS=OFF

	DELIMITER // -- This is needed so that the individual commands don't end the stored procedure
	CREATE PROCEDURE insert_zaps(IN num_rows int, IN pct_shared float) -- Two input args are needed
	BEGIN
	DECLARE loop_count bigint; -- Variables are initialized with a type
	DECLARE len_table bigint;
	DECLARE rand_base float;
	DECLARE rand_offset float;
	DECLARE rand_ts timestamp;
	DECLARE rand_user bigint;
	DECLARE shared_with_user bigint;

	import argparse
	from collections import deque
	from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, wait
	from math import floor
	from multiprocessing import Value
	from os import urandom
	import random

	class Allocator:
	def __init__(self, id_max: int):