Skip to content

Instantly share code, notes, and snippets.

View lmmx's full-sized avatar
💡
lights, camera, action

Louis Maddox lmmx

💡
lights, camera, action
View GitHub Profile
@lmmx
lmmx / 0_prompt.md
Created January 19, 2026 12:59
A design report written by Claude Opus 4.5 (in Claude Code) on a Rust query planner for Python pathlib Path handling

I wrote this blog https://cog.spin.systems/future-paths-template-strings it describes how to use t-strings and pathlib Paths together to do symbolic path manipulation. I want to take the idea further though - see the code https://gist.github.com/lmmx/f5d1b07d266f160f9a431c1f6bdc8a17 - I want to use pyo3 to optimise the path operations like polars does with its queries and to avoid allocations - right now it is not genuinely deferred, only symbolic (deferred resolution). Polars builds a lazy logical plan, optimizes it, then executes. You can do the same: separate the expression tree (what you have now) from a compiled plan that's cheap to execute repeatedly. take a look under the hood in the polars source code and write me a report on the specific analogues (as in analogies) between the code bases and how i would design it. write an extensive report and make sure you cite the sources for factual claims, dont just imply something is true without giving me the ability to verify it. write your report to markdown

@lmmx
lmmx / deferred_path_exprs.py
Created January 18, 2026 23:34
Deferred templated pathlib Path-like type with t-strings
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from string.templatelib import Interpolation, Template
from typing import Any
# ============================================================
# Layer 1: Parameters (late binding slots)
# ============================================================
@lmmx
lmmx / 0_view_base_df.py
Last active January 17, 2026 22:12
Eliminating a Cartesian product in Polars dataframe handling https://cog.spin.systems/wikidata-digging-avoiding-combinatorial-explosion
import polars as pl
from pathlib import Path
claims_file = Path("results/claims/chunk_0-00400-of-00546.parquet")
lf = pl.scan_parquet(claims_file)
base = (
lf.select(pl.col("claims").explode().struct.unnest())
.drop("key")
@lmmx
lmmx / chatgpt_review.md
Created January 13, 2026 16:42
ChatGPT review of the Polars implementation of CDC at https://github.com/lmmx/polars/tree/parquet-cdc

This is a very good write-up, and your intuition (“doing it correctly should fix perf”) is understandable — but the disappointing result actually makes sense once you zoom in on what exactly PyArrow’s CDC is optimizing for vs what Polars already does better than PyArrow.

I’ll be blunt up front, then unpack it carefully:

You implemented CDC correctly, but you applied it to a system that was already more dedup-friendly than PyArrow, and your CDC implementation is actively destroying the accidental alignment properties that Xet’s deduper was exploiting.

So this isn’t “CDC is broken”, and it isn’t “your implementation is wrong”. It’s that CDC is not universally beneficial, and especially not on top of Polars’ existing Parquet layout.


@lmmx
lmmx / TOP_100_HN_BLOGS_2025.md
Created January 3, 2026 19:12
HN Popularity Contest The highest-ranking personal blogs of Hacker News [ Methodology] via https://refactoringenglish.com/tools/hn-popularity/?start=2025-01-01&end=2025-12-31

REFACTORING ENGLISH Author Sample Chapters Blog Tools Services HN Popularity Contest

The highest-ranking personal blogs of Hacker News [ Methodology]

@lmmx
lmmx / INSTRUCTION.md
Last active January 3, 2026 19:08
‘AI tells’ rubric for detection of LLM generated text

You are a text evaluator. You will be given a piece of text and an AI Tells Rubric. Use the rubric to judge the text objectively. Read the text closely, identify any AI tells exactly as defined in the rubric, and support each finding with direct excerpts from the text. Structure your evaluation as a clear summary that follows the rubric’s categories, including severity or confidence levels if the rubric defines them, and provide a final judgment or score based solely on the rubric. Do not rewrite, improve, or correct the text, and do not add any criteria that are not present in the rubric. If a rubric item is unclear or absent, mark it as Not Applicable. If no AI tells are detected, state that explicitly and justify briefly. Your analysis must be fully traceable to the rubric and the evaluated text so a human can verify every conclusion.

@lmmx
lmmx / gist:d80cbed62296dcec8188ef4350db6166
Created December 31, 2025 15:33
Inverse problems gallery gist
Attached
@lmmx
lmmx / gist:8ad1911dc81842193828721ea9395446
Created December 31, 2025 12:40
Example topics from arxiv_explorer
Attached
via https://www3.cs.stonybrook.edu/~cvl/docunet.html
@lmmx
lmmx / plot_eval_reductions.py
Created December 30, 2025 18:14
Blog post plot of eval reductions vs speedup on page-dewarp
import matplotlib.pyplot as plt
# Data
images = [
"boston_cooking_a",
"boston_cooking_b",
"finnish_cooking_a",
"linguistics_thesis_a",
"linguistics_thesis_b"
]