Daniel J. B. Clarke u8sand

I'm a data scientist currently working in the Ma'ayan Lab at the Icahn School of Medicine at Mount Sinai.

44 followers · 0 following

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

u8sand / pd_merge.py

Created April 9, 2020 16:47

Pandas merges simplified

	import pandas as pd

	def merge(left, rights, *kwargs):
	''' Helper function for many trivial (index based) joins
	Usage:
	merge(frame1, frame2, frame3, how='inner')
	'''
	merged = left
	for right in rights:
	merged = pd.merge(left=merged, left_index=True, right=right, right_index=True, **kwargs)

u8sand / quantile_normalize.py

Last active April 9, 2020 16:43

Quantile normalization in python

	import pandas as pd
	import numpy as np

	def quantileNormalize_np(mat):
	# sort vector in np (reuse in np)
	sorted_vec = np.sort(mat, axis=0)
	# rank vector in np (no dict necessary)
	rank = sorted_vec.mean(axis=1)
	# construct quantile normalized matrix
	return np.array([

u8sand / priority_cache.ts

Last active March 25, 2020 16:16

Typescript priority cache I made for a project but never used

	export class PriorityCache<V = any> {
	store: {
	[key: string]: {
	value: V,
	start: number,
	cost: number,
	demand: number,
	size: number,
	}
	} = {}

u8sand / dezoomify.py

Last active November 5, 2022 20:52

Convert zoomify pages into giant images

	#!/bin/python
	# pip install click Pillow tqdm
	# dezoomify.py will infer the shape while downloading all the tiles and then re-assemble a complete image.
	# Usage:
	# python dezoomify.py -v -Z 5 -t tmp -o output.jpg https://baseurl/zoomify/id/TileGroup{0,1,2,3}
	# -Z n: zoom level (if not specified, will identify the best zoom level and download that one)
	# -v: verbosity
	# -t [dir]: temporary directory (can be omitted, but recommended in case it needs to be re-run)
	# -o [file.jpg]: output file
	# [url...]: the urls to download the tiles

u8sand / ncbi_lookup.py

Last active January 7, 2020 21:56

A convenient snippet for ncbi-driven gene synonym symbol mapping

	import pandas as pd

	ncbi = pd.read_csv('ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz', sep='\t')
	# Ensure nulls are treated as such
	ncbi = ncbi.applymap(lambda v: float('nan') if type(v) == str and v == '-' else v)
	# Break up lists
	split_list = lambda v: v.split('\|') if type(v) == str else []
	ncbi['dbXrefs'] = ncbi['dbXrefs'].apply(split_list)
	ncbi['Synonyms'] = ncbi['Synonyms'].apply(split_list)
	ncbi['LocusTag'] = ncbi['LocusTag'].apply(split_list)

u8sand / keybindings.json

Last active March 26, 2021 17:04

Visual studio code keybindings for using shift+enter to run selection in terminal whenever a terminal panel is opened (and not interfere with python interactive terminal)

	// Place your key bindings in this file to overwrite the defaults
	[
	{
	"key": "shift+enter",
	"command": "workbench.action.terminal.runSelectedText",
	"when": "editorTextFocus && terminalIsOpen"
	},
	{
	"key": "shift+enter",
	"command": "markdown-preview-enhanced.runCodeChunk",

u8sand / parsed_url.py

Created November 19, 2019 19:27

A convenient mutable python urlparse for parsing and manipulating urls

	import urllib.parse as urllib_parse

	class ParsedUrl:
	''' A convenient object for manipulating urls -- works
	like urlparse + parse_qs but mutable

	Example:
	url = ParsedUrl('http://google.com/?q=hello+world&a=this#anchor')
	del url.fragment
	del url.query['a']

u8sand / cannonical_uuid.py

Created October 4, 2019 16:07

A simple reproducible way of generating consistent UUIDs based on an object

	import uuid
	U = uuid.UUID('00000000-0000-0000-0000-000000000000')
	def canonical_uuid(obj):
	return str(uuid.uuid5(U, str(obj)))

u8sand / deep_typeof.py

Created September 29, 2019 20:31

A convenient type function that reports the set of types in dictionaries/other iterables up to a desired depth

	''' Usage:
	val = {1: [{'hello': 'world', 2: 'okay'}], 2: []}
	typeof(val) == dict[int: list,list[dict[int,str: str]]]
	typeof(val, 2) == dict[int: list]

	i.e. it will look (as deep as you want) into your object and let you know what it is
	'''
	import functools

	def isIterable(v):

u8sand / defaultdict.py

Created September 29, 2019 20:30

A simple implementation of defaultdict which works better for nested defaultdicts

	''' Motivation: standard python collections.defaultdict doesn't work to well with
	deep defaultdicts because it only actually creates the object on __setitem__; this
	means the following occurs:
	```
	d = collections.defaultdict(lambda: collections.defaultdict(lambda: []))
	d[1][2].append(3)
	3 in d[1][2] == False # you would have expected it to be True
	```

	The above situation works for the below implementation. The trade-off is potential

Newer Older