Skip to content

Instantly share code, notes, and snippets.

View tomsing1's full-sized avatar

Thomas Sandmann tomsing1

View GitHub Profile
@sj-io
sj-io / CLAUDE.md
Created August 21, 2025 09:25
Claude R Tidyverse Expert

Modern R Development Guide

This document captures current best practices for R development, emphasizing modern tidyverse patterns, performance, and style. Last updated: August 2025

Core Principles

  1. Use modern tidyverse patterns - Prioritize dplyr 1.1+ features, native pipe, and current APIs
  2. Profile before optimizing - Use profvis and bench to identify real bottlenecks
  3. Write readable code first - Optimize only when necessary and after profiling
  4. Follow tidyverse style guide - Consistent naming, spacing, and structure
#!/usr/bin/env bash
###
# NB: You probably don't want this gist any more.
# Instead, use this version from `fastsetup`:
# https://github.com/fastai/fastsetup/blob/master/setup-conda.sh
###
set -e
cd
@wolfv
wolfv / github_actions.yaml
Last active August 15, 2025 11:04
micromamba usage
name: CI
on:
push:
branches:
- main
pull_request:
branches:
- main
@jennybc
jennybc / 2020-03-29_sane-legend.R
Created March 30, 2020 05:21
Make the legend order = data order, with forcats::fct_reorder2()
library(tidyverse)
library(patchwork)
dat_wide <- tibble(
x = 1:3,
top = c(4.5, 4, 5.5),
middle = c(4, 4.75, 5),
bottom = c(3.5, 3.75, 4.5)
)
suppressPackageStartupMessages(library(plyranges))
suppressPackageStartupMessages(library(AnnotationHub))
suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg19.knownGene))
ah <- AnnotationHub()
query(ah, c("K562","CTCF","unipk"))
peaks <- ah[["AH22543"]]
peaks <- peaks %>% keepStandardChromosomes(pruning.mode="coarse")
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
library(purrr)
library(dplyr)
na_set <- function(x, p){
p <- as_mapper(p)
x[p(x)] <- NA
x
}
# or something like this using case_when
@amorgun
amorgun / sample.py
Created November 10, 2017 10:51
SqlAlchemy postgres bulk upsert
from sqlalchemy.dialects import postgresql
def bulk_upsert(session: Session,
items: Sequence[Mapping[str, Any]]):
session.execute(
postgresql.insert(MyModel.__table__)
.values(items)
.on_conflict_do_update(
index_elements=[MyModel.id],
set_={MyModel.my_field.name: 'new_value'},
@tomsing1
tomsing1 / schemaSpy.md
Created October 25, 2017 00:25
Creating an html report for a Postgres database using schemaSpy

schemaSpy

Here's how to use it to generate schema relationship diagrams for PostgreSQL databases.

Prerequisites

  1. Download the jar file from here (the current version is schemaSpy_5.0.0.jar)
    • Note: There is a release candidate for version 6, but I couldn't get that to play nicely with graphviz on Mac OS 10.13
  2. Get the PostgreSQL JDBC driver (either the JDBC3 or JDBC4 jar file is fine)
  3. Install graphviz
@MarkEdmondson1234
MarkEdmondson1234 / massiveCPUonGCE.R
Last active April 22, 2018 23:46
Run massive parallel R jobs cheaply on Google Compute Engine with googleComputeEngineR and future
## see also http://blog.revolutionanalytics.com/2017/06/doazureparallel-updated.html on how to run on Azure
## and cloudyr project for AWS https://github.com/cloudyr/aws.ec2
# now also in docs: https://cloudyr.github.io/googleComputeEngineR/articles/massive-parallel.html
library(googleComputeEngineR)
library(future)
## auto auth to GCE via environment file arguments
@mfansler
mfansler / install-ascp.sh
Last active January 22, 2025 15:04
Install ascp on Linux
#!/bin/bash
## How to install ascp, in a gist.
## The URI below is not persistent!
## Check for latest link: https://www.ibm.com/aspera/connect/
wget -qO- https://ak-delivery04-mul.dhe.ibm.com/sar/CMA/OSA/0a07f/0/ibm-aspera-connect_4.1.0.46-linux_x86_64.tar.gz | tar xvz
## run it
chmod +x ibm-aspera-connect_4.1.0.46-linux_x86_64.sh