Skip to content

Instantly share code, notes, and snippets.

View afermg's full-sized avatar

Alán F. Muñoz afermg

View GitHub Profile
@afermg
afermg / cosine_duckdb.org
Created December 4, 2025 04:54
Calculate cosine similarity using only duckdb.

We will get the cosine similarity of genes starting with CDK and PPARG using pure duckdb.

Get the ids

CREATE OR REPLACE TABLE jcp_symbol AS (SELECT Metadata_JCP2022,Metadata_Symbol FROM read_csv(['https://github.com/jump-cellpainting/datasets/raw/99c34b66f51c5971c85417c02e191f43057c22a8/metadata/crispr.csv.gz', 'https://github.com/jump-cellpainting/datasets/raw/99c34b66f51c5971c85417c02e191f43057c22a8/metadata/orf.csv.gz']) WHERE starts_with(Metadata_Symbol, 'CDK') OR starts_with(Metadata_Symbol, 'PPARG'));
SELECT * FROM jcp_symbol LIMIT 2;
SELECT COUNT(*) AS nrows FROM jcp_symbol;
@afermg
afermg / fast_duckdb.py
Last active December 3, 2025 22:12
Run fast duckdb queries where python is the bottleneck
from threading import Thread
import duckdb
def thread_duckdb(fn, iterable, *args, con=None):
"""
Maximise performance using duckdb theading.
The first argument of `fn` must be a Duckdb connection.
@afermg
afermg / mxroute_instructions.txt
Created November 24, 2025 23:43
mxroute instructions
READ EVERY. SINGLE. PART. OF. THIS. EMAIL.
PLEASE, WE BEG YOU.
IMPORTANT INFORMATION:
1. If there are service outages, they will be reported here: https://status.mxroute.com
2. Get support here: https://mxroute.com/support/
3. Documentation can be found here: https://docs.mxroute.com/ and our Policy here: https://mxroute.com/policy.html
4. If you skip reading this email, the entire service will be much more difficult to use.
5. CONFIGURE YOUR SPAM FILTERS BEFORE COMPLAINING ABOUT SPAM, PLEASE AND THANK YOU. Find a tutorial for that here: https://docs.mxroute.com/docs/spam-filter.html
@afermg
afermg / skimage_output_features.py
Created November 5, 2025 00:08
Find the number of output features in scikit image
import numpy as np
from skimage.measure import regionprops_table
img = np.ones((5, 5), dtype=int)
mask = np.zeros((5, 5), dtype=int)
mask[0:3, 0:3] = 1
feat_names = (
"area",
"bbox",
@afermg
afermg / count_features_cellpainting_cnn.sh
Created November 4, 2025 22:31
count the number of features from the output of cellpainting cnn
INSTALL httpfs;
LOAD httpfs;
SELECT COUNT(column_name) FROM (DESCRIBE SELECT COLUMNS('X_*') FROM read_parquet('https://celpainting-gallery.s3.amazonaws.com/cpg0042-chandrasekaran-jump/source_all/workspace/profiles_assembled/compound_DL_CPCNN/v1.0profiles.parquet'));
@afermg
afermg / track_aliby_output.sh
Created August 19, 2025 21:46
Track aliby's output
watch --interval 1 'find . -type d -exec bash -c \'echo "$1: $(find "$1" -maxdepth 1 -type f | wc -l)"\' _ {} \; | grep segment_cyto | cut -f4,5 -d"/" | sed "s/\/segment_cyto//" | sort'
@afermg
afermg / registry_a549_virtual_staining.txt
Created July 31, 2025 03:14
Pooch registry for virtual staining time series data
.zattrs eaa176fc87fe2baaba069856c2511b7cace6f07ea6dc6a77726c34eb22c1db81
.zgroup 2383746e67b4bcc2762b3f100f06c3fa2d5f149ab5a8e5da5d33521464a01959
A/.zgroup 2383746e67b4bcc2762b3f100f06c3fa2d5f149ab5a8e5da5d33521464a01959
A/1/.zattrs a44d4b2efe6344858f5bbd91a52ffbaec450a552f2e79abfe967726eb8e8d7b4
A/1/.zgroup 2383746e67b4bcc2762b3f100f06c3fa2d5f149ab5a8e5da5d33521464a01959
A/1/1/.zattrs ae19fc9113b70974a7e9827815901b240dee9d10d5c17ec4f256675e805b8eab
A/1/1/.zgroup 2383746e67b4bcc2762b3f100f06c3fa2d5f149ab5a8e5da5d33521464a01959
A/1/1/0/.zarray 2ec105822a9fb5f84ec19a0152be51bc0f4e7a75c593fd90a53f0745df695b91
A/1/1/0/0/0/0/0/0 a76987813876f33ff979f8c6f36c228d2ffd8f091b7b296f1330dd9f98339e95
A/1/1/0/0/1/0/0/0 27e700785f97735c373caff16bc16398f4c6e4aa59555ef16c58cce476c8285f
@afermg
afermg / cs2biorxiv.sh
Created June 30, 2025 19:32
Convert the Carpenter-Singh lab format for author details into the biorxiv format.
awk 'BEGIN {FS="\t"; OFS="\t";}; {if (NR > 4) {corr=""; if (NR==11) corr="X"; print $9,$6,$2,$3,$4,"",corr,"","",$7;}}' cp\ measure\ authors.xlsx\ -\ Authors.tsv | iconv -f utf8 -t ascii//TRANSLIT//IGNORE | cat (printf "Email\tInstitution\tFirst Name\tMiddle Name(s)/Initial(s)\tLast Name\tSuffix\tCorresponding Author\tHome Page URL\tCollaborative Group/Consortium\tORCiD\n" | psub) - > cp_measure_authors.tsv
@afermg
afermg / check_links.sh
Created June 19, 2025 19:33
Find broken links in markdown files
@afermg
afermg / csv2json.jq
Last active June 19, 2025 12:45
Convert csv to json
# From https://stackoverflow.com/a/32002086
# Run as `jq -R -s -f csv2json.jq csv.csv`
# objectify/1 takes an array of string values as inputs, converts
# numeric values to numbers, and packages the results into an object
# with keys specified by the "headers" array
def objectify(headers):
# For jq 1.4, replace the following line by: def tonumberq: .;
def tonumberq: tonumber? // .;
. as $in