Skip to content

Instantly share code, notes, and snippets.

@ebetica
ebetica / lance_query_bench.py
Last active March 23, 2026 19:29
Benchmark lance scalar index lookups: sequential, batched, and async
"""
Benchmark lance scalar index lookups: sequential, batched, and async.
Lance (https://lancedb.github.io/lance/) is a columnar format that supports
BTREE scalar indices, making point lookups fast even over S3 — no local copy
or database server needed. This script benchmarks three lookup patterns against
a 120M-row dataset stored on S3:
- Sequential: one filter query at a time (baseline)
- Batched IN: single query with WHERE protein_hash IN (...)
@ebetica
ebetica / lance_mapper_example.py
Created March 19, 2026 22:40
lance_mapper: parallel map over Lance datasets on SLURM (usage example)
"""
lance_mapper: parallel map over Lance datasets on SLURM
========================================================
This example shows how to use LanceMapper to run an embarrassingly parallel
computation over a Lance dataset using SLURM job arrays.
The pattern:
1. Subclass LanceMapper
2. Set key_column (unique ID column) and rows_per_shard
@ebetica
ebetica / fold_val_report.md
Last active March 18, 2026 22:26
ESMCFold on val_filtered.lance (600K sequences) — lance_mapper benchmark

ESMCFold on val_filtered.lance (600K sequences)

  • Date: 2026-03-18
  • Branch: zeming/lance-mapper
  • Script: claude_scratchpad/fold_val_test.py
  • Dataset: /bio/projects/es/zlin/esmc2_datasets/260312_uniref_seqonly/val_filtered.lance
  • Model: Janaury trainout of ESMCFold hero medium (24blk, 12 diffusion steps, no MSA, confidence-trained)
  • Checkpoint: conf_esmcfold_hero_medium_24blk_12diffu_no_msa_bs128_ctx512_mult2_noise1.1_step1.0_nodiffcond/epoch-0000-step-7000_cleaned.ckpt

Dataset

@ebetica
ebetica / parity.py
Created February 16, 2018 20:36
LSTMs suck
import torch
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable
import sys
nlen = 5
model_type = nn.LSTM
running_loss = 1
@ebetica
ebetica / PKGBUILD
Created June 3, 2017 15:04
Singularity Container pkgbuild
pkgname='singularity-container'
pkgver='2.3'
pkgrel='0'
pkgdesc='Container platform focused on supporting "Mobility of Compute".'
arch=('i686' 'x86_64')
url='http://singularity.lbl.gov'
license=('BSD')
depends=('bash' 'python')
source=("https://github.com/singularityware/singularity/releases/download/${pkgver}/singularity-${pkgver}.tar.gz")
md5sums=('dbc02b17f15680c378c1ec9e4d80956d')
#include <ctime>
#include <iostream>
#include "replayer.h"
using namespace std;
using namespace torchcraft::replayer;
int main() {
std::clock_t start;
double duration;
@ebetica
ebetica / rep_info.py
Created February 16, 2017 18:29
Example script to go through some starcraft replays and grab infomation about it, dumping into a CSV
# This script tries as best as possible to filter out bad replays
# Pass it a subdir, and it will read all '.rep' files, and spit out a list
# of the corrupt files in stdout
from __future__ import print_function
from pyreplib import replay
from itertools import repeat
from multiprocessing import Pool, Process, Pipe
from multiprocessing.pool import ThreadPool
from Queue import Queue
import os
@ebetica
ebetica / check_rep.py
Created February 9, 2017 22:59
Runs through a directory of starcraft replays and outputs all the corrupt ones
# This script tries as best as possible to filter out bad replays
# Pass it a subdir, and it will read all '.rep' files, and spit out a list
# of the corrupt files in stdout
from __future__ import print_function
from pyreplib import replay # https://github.com/HearthSim/pyreplib/
from itertools import repeat
from multiprocessing import Pool, Process, Pipe
from multiprocessing.pool import ThreadPool
import os
import sys
@ebetica
ebetica / snippet.py
Last active January 23, 2017 19:42
Pytoch reinforce function
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class Policy(nn.Module):
def __init__(self):
super(Policy, self).__init__()
self.affine1 = nn.Linear(4, 128)
import argparse
import gym
import numpy as np
from itertools import count
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.autograd as autograd