Bill Flynn wflynny

Single Cell & Spatial Biology | Scientific Core Leader | Lateral Productivity Specialist

wflynny / cellranger_count_scenarios.sh

Created January 23, 2019 19:55

Cellranger count snippets (version 2)

	# Some universal variables
	NCELLS=6000
	OUTPUT_NAME="nice-name"
	FASTQ_DIR="/path/to/fastqs"
	REFERENCE_GENOME="/path/to/reference_dir"
	[[ -z "${PBS_NUM_PPN}" ]] && NCORES=20 \|\| NCORES=${PBS_NUM_PPN}

	# When reads look like:
	# sample-name_S?_L00?_R1_001.fastq.gz
	# sample-name_S?_L00?_R2_001.fastq.gz

wflynny / gpfs_expiration_checker.sh

Created January 24, 2019 20:05

Check file lifetime stats on a GPFS

	# I usually put this in my ~/.bash_aliases
	# A portion of our GPFS storage removes files after 21 days of creation.
	# `stat` does not show creation time, so we have to resort to parsing the
	# output of `mmlsattr`

	ftime() {
	# Usage:
	# ftime path/to/file
	#
	# Outputs:

wflynny / gist:d6c95deadf0c4d1cce4f01a729314dbb

Created January 24, 2019 21:20

Illumina sequencer identifiers in fastq read headers

	# Find myself referring to this thread a lot:
	# https://www.biostars.org/p/198143/
	# However updating codes with what I see at JAX
	@Mxxxx - MiSeq
	@Dxxxx - HiSeq 2500
	@Kxxxx - HiSeq 4000
	@NSxxx - NextSeq 500/550
	@Axxxxx - NovaSeq

wflynny / scanpy_cluster_proportions.py

Last active October 13, 2023 17:42

Stacked barplot of scRNA-seq cluster proportions per sample

	import scanpy.api as sc
	import matplotlib.pyplot as plt
	import seaborn as sns

	def get_cluster_proportions(adata,
	cluster_key="cluster_final",
	sample_key="replicate",
	drop_values=None):
	"""
	Input

wflynny / build_10x_reference.sh

Last active February 5, 2019 20:01

Building 10X reference genomes from Ensembl

	# Visit the Ensembl ftp site.
	# ftp://ftp.ensembl.org/pub/release-95/
	#
	# You want to find data under the following two URLs:
	# 1. ftp://ftp.ensembl.org/pub/release-95/fasta/[YOUR_SPECIES_HERE]/dna/
	# 2. ftp://ftp.ensembl.org/pub/release-95/gtf/[YOUR_SPECIES_HERE]/
	#
	# The first file of interest is under the fasta URL:
	# [YOUR_SPECIES_HERE].[ASSEMBLY].dna.primary_assembly.fa.gz
	# or, if that doesn't exist,

wflynny / jupyter-server

Created June 6, 2019 15:28

Running jupyter on a cluster

	#!/usr/bin/env bash
	#### PBS preamble

	#PBS -N jupyter-server
	#PBS -o /path/to/software/logs/jupyter-server.${PBS_JOBID%%.*}.out

	#PBS -j oe
	#PBS -m n

	#PBS -l mem=128GB

wflynny / jupyter-launch.bash

Last active June 6, 2019 15:31

Bash alias/functions to launch jupyter-server

	_grab_ip() {
	jobid=$1
	port=$2
	hostname=$(qstat -f ${jobid} \| grep -oP "exec_host = (\K[a-z0-9]+)")
	echo "http://${hostname}:${port}"
	}

	_submit_job() {
	queue=$1
	port=$2

wflynny / hto_demux.py

Last active December 18, 2019 19:54

HTO demuxing in python

	from sklearn.cluster import KMeans
	import numpy as np
	import pandas as pd
	import scanpy as sc

	def load_hto_matrix(mtx_dir):
	raw_htos = sc.read_mtx(mtx_dir + "/matrix.mtx.gz").T
	raw_htos.var = pd.read_csv(mtx_dir + "/features.tsv.gz", header=None, index_col=0)
	raw_htos.obs = pd.read_csv(mtx_dir + "/barcodes.tsv.gz", header=None, index_col=0)
	raw_htos = raw_htos[:, ~raw_htos.var_names.isin(["unmapped"])]

wflynny / susage

Last active July 29, 2020 02:00

Small utility to run top or nvidia-smi on a compute node from the login node

	#!/usr/bin/env bash

	TEMP=$(getopt -o hsg --long help,snapshot,gpu -n 'susuage' -- "$@")

	if [ $? != 0 ] ; then echo "Terminating..." >&2 ; exit 1 ; fi

	# Note the quotes around `$TEMP': they are essential!
	eval set -- "$TEMP"

	SNAPSHOT=false

wflynny / add_gene_name_to_gtf.py

Created July 16, 2020 16:51

	import re
	import argparse

	parser = argparse.ArgumentParser()
	parser.add_argument("-i", "--infile", required=True)
	parser.add_argument("-o", "--outfile", required=True)
	args = parser.parse_args()

	gene_matcher = re.compile('\tgene\t.gene_id (".?");.Name (".?");')
	parent_matcher = re.compile('gene_id (".?");.Parent (".*?");')