Skip to content

Instantly share code, notes, and snippets.

View lemire's full-sized avatar
🚀
working hard and fast

Daniel Lemire lemire

🚀
working hard and fast
View GitHub Profile
@lemire
lemire / compile-time-gperf-cpp.md
Created March 3, 2026 20:57
compile-time-gperf-cpp

Problem

Often, when parsing, you encounter a point where you want to check if a string is one out of a set, as quickly as possible. E.g., if you are parsing URLs, the string must start with a protocol, and there is a finite list of protocols. There are many such problems. Think about YAML files where a parameter must take one out of 3 or 5 values.

How do solve this classically?

If you have 5 strings (say http:, https:, file:, sftp:, ftp:), you may do 5 comparisons. This is obviously wasteful.

@lemire
lemire / simdjson_example.cpp
Created February 24, 2026 00:11
simdjson example
/*
This program serves as a comprehensive test suite for the consume_array and
consume_object functions, which are designed to parse specific JSON structures
using the simdjson library. The primary goal is to validate the parsing of
arrays and objects that contain a string, a double, and an optional integer
value. The test generates a large JSON array consisting of 200 elements: the
first 100 are sub-arrays, and the next 100 are objects. This allows for thorough
testing of both parsing paths under various conditions.
The consume_array function takes a simdjson ondemand::array and extracts three
@lemire
lemire / two_sum.py
Created February 10, 2026 18:00
Find two elements that sum to a given target in python
import numpy as np
np.seterr(over='ignore')
def set_two_sum(nums, target):
"""
Fast O(n) approach using a set to store seen numbers.
"""
seen = set()
for num in nums:
@lemire
lemire / onlinepython.html
Created February 9, 2026 18:25
online python
<div style="max-width: 1000px; margin: 0 auto; background-color: #ffffff; padding: 24px; border-radius: 8px; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);">
<h1 style="font-size: 24px; font-weight: bold; margin-bottom: 16px; color: #1f2937;">Laboratoire Python en ligne </h1>
<p style="margin-bottom: 16px; color: #4b5563;">
Modifiez ou utilisez le code Python ci-dessous, puis cliquez sur "Exécuter" pour afficher les résultats. Exemple :
</p>
<script src="https://cdn.jsdelivr.net/pyodide/v0.27.6/full/pyodide.js"></script>
<div style="margin-bottom: 24px;">
<h2 style="font-size: 18px; font-weight: bold; color: #374151; margin-bottom: 8px;">Exemple Python :</h2>
<pre style="background-color: #e6f4ea; padding: 12px; border: 1px solid #15803d; border-radius: 4px; font-family: monospace; font-size: 14px; color: #374151;">etudiants = [
@lemire
lemire / github.py
Created December 31, 2025 22:45
Get your GitHub activity for 2025
#!/usr/bin/env python3
# uv run lemire.py --token <your_github_token> --user <github_username>
#
# To generate a GitHub Personal Access Token:
# 1. Go to https://github.com/settings/tokens
# 2. Click "Generate new token (classic)"
# 3. Give it a name, e.g., "GitHub Search Script"
# 4. Select scopes: For public repositories, select "public_repo". For private, select "repo".
# 5. Click "Generate token"
# 6. Copy the token and use it as --token argument.
@lemire
lemire / gabarit.md
Created November 14, 2025 16:18
gabarit INF 1220

Gabarit pour les travaux notés 2, 3, 4 et 5

Les travaux notés 3, 4 et 5 comprennent du code que vous devez expliquer. Nous vous demandons de remettre des rapports comprenant du code facile à consulter et bien expliqué. Pour vous aider, nous vous proposons le gabarit suivant.

Travail noté 3 – INF 1220

Nom : Votre nom

@lemire
lemire / check.py
Created October 28, 2025 20:41
random generation test
import random
def unbiased_random(s, L=32):
pow2 = 1 << L
if not (0 <= s <= pow2):
raise ValueError("s must be in [0, 2**L]")
if s == 0:
return 0
x = random.randrange(pow2)
m = x * s
@lemire
lemire / perfregress.py
Created October 8, 2025 19:16
Python script to explore a potential performance regression in CRoaring
import subprocess
import json
import os
import sys
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from collections import defaultdict
import tempfile
## written with AI, but it looks good enough for a demo.
import re
def generate_bitsets_for_block(block, block_size, state):
# Initialize bitsets for this block (0s)
block_length = len(block)
comment_bits = [0] * block_length
line_ending_bits = [0] * block_length
semicolon_bits = [0] * block_length
whitespace_bits = [0] * block_length
#include <array>
#include <format>
#include <iostream>
#include <optional>
#include <ranges>
#include <tuple>
#include <vector>
template <typename T, std::size_t N>
void print_array(const std::array<T, N> &arr) {