Skip to content

Instantly share code, notes, and snippets.

@bsidhom
bsidhom / counting.py
Last active May 17, 2024 00:07
Generate power sets or subset combinations from (indexed) sequences
#!/usr/bin/env python3
def main():
print("powerset:")
for subset in powerset([0, 1, 2, 3]):
print(subset)
print()
print("combinations:")
for subset in combinations([0, 1, 2, 3, 4, 5], 3):
print(subset)
@bsidhom
bsidhom / parse_csv.py
Created December 6, 2022 08:06
A toy CSV parser combinator from scratch
#!/usr/bin/env python3
from abc import ABC, abstractmethod
from enum import Enum
from typing import Any, Callable, ForwardRef, Generic, TypeAlias, TypeVar, cast
class RecordSeparator(Enum):
VALUE = "RecordSeparator"
@bsidhom
bsidhom / seq.py
Last active December 4, 2022 00:04
Generate Necklace, Lyndon, and de Bruijn sequences
#!/usr/bin/env python3
import argparse
from typing import Callable, Generator, TypeVar
def main():
parser = argparse.ArgumentParser(
"Generates necklace-related sequences using the FKM algorithm as described by \"Combinatorial Generation\" by Frank Ruskey. See https://web.archive.org/web/20221203232527/https%3A%2F%2Fpage.math.tu-berlin.de%2F~felsner%2FSemWS17-18%2FRuskey-Comb-Gen.pdf"
@bsidhom
bsidhom / log_data.lua
Last active October 29, 2022 01:01
Write wrk latency data to a CSV file and visualize it
-- To use this, invoke wrk with the lua script option, e.g.:
-- env WRK_OUTPUT_CSV='timing.csv' wrk -s log_data.lua 'http://localhost:8000/'
done = function(summary, latency, requests)
-- Unfortunately, writing output to stdout is not a good option because it
-- gets intermingled with other `wrk` output. The maintainer is not amenable
-- to suppressing this (see -- https://github.com/wg/wrk/issues/245).
-- Similarly, we cannot write to stdout safely because _actual_ error messages
-- are also logged there. Instead, we write to a file (or stdout by default,
-- for human-readability purposes).
filename = os.getenv("WRK_OUTPUT_CSV")
@bsidhom
bsidhom / spark_eventlog_schema.sql
Created October 21, 2022 21:47
Extract Spark eventlog schema from a duckdb shell
create table j as select json(column0) as j from read_csv_auto('eventlog.json', delim='\0');
.mode list
.header off
.once 'schema.json'
select json_group_object(event, structure) from (select j->>'Event' as event, json_group_structure(j) as structure from j group by event);
@bsidhom
bsidhom / client.py
Created October 4, 2022 02:21
Simple Python asyncio server and client with a JSON protocol
#!/usr/bin/env python3
import asyncio
import json
import sys
async def main():
reader, writer = await asyncio.open_connection("127.0.0.1", 8888)
messages = ["One", 2, {"thr": "ee"}, ["4"], [5, 6]]
@bsidhom
bsidhom / read_json.py
Last active October 4, 2022 04:11
Read stream of concatenated JSON contents
#!/usr/bin/env python3
import argparse
import asyncio
import codecs
import contextlib
import json
import sys
import threading
@bsidhom
bsidhom / range.js
Created September 30, 2021 01:02
Generate a range or interpolated section with generators
let range = (start, end, step) => {
return {
*[Symbol.iterator]() {
for (let n = start; n < end; n += step) {
yield n;
}
},
};
};
@bsidhom
bsidhom / dump-window.js
Created September 14, 2021 00:05
JSON serialize the window object by removing cycles
// Taken from https://stackoverflow.com/a/9382383
let decycle;
decycle = (obj, stack = []) => {
if (!obj || typeof obj !== 'object') {
return obj;
}
if (stack.includes(obj)) {
return null;
}
let s = stack.concat([obj]);
@bsidhom
bsidhom / stream_array.jq
Last active March 11, 2021 00:18
Turn a top-level JSON array into a stream of objects using jq
# Usage: jq --stream --null-input --from-file stream_array.jq
#
# Or, as a one-liner:
# jq --stream -n 'def end_array_or_object: length == 1 and (first | length == 1); def outer_primitive: length == 2 and (first | length == 0); def end_element: length > 0 and (last | (end_array_or_object or outer_primitive)); foreach (inputs | del(.[0][0])) as $path ([]; if end_element then [$path] else . + [$path] end; if end_element then fromstream(.[]) else empty end)'
# This is useful for processing large JSON files/indefinite streams of elements
# that happen to be wrapped in an outer array.
# The process is:
# - Stream the path elements (via `jq --stream`)
# - Remove the leading path element from all entries. This effectively drops the