Skip to content

Instantly share code, notes, and snippets.

@eric-czech
eric-czech / baseball-stats.csv
Created August 29, 2022 17:48
baseball-stats.csv
We can't make this file beautiful and searchable because it's too large.
playerID,yearID,stint,teamID,lgID,G,AB,R,H,2B,3B,HR,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP
abercda01,1871,1,TRO,NA,1,4,0,0,0,0,0,0,0,0,0,0,,,,,0
addybo01,1871,1,RC1,NA,25,118,30,32,6,0,0,13,8,1,4,0,,,,,0
allisar01,1871,1,CL1,NA,29,137,28,40,4,5,0,19,3,1,2,5,,,,,1
allisdo01,1871,1,WS3,NA,27,133,28,44,10,2,2,27,1,1,0,2,,,,,0
ansonca01,1871,1,RC1,NA,25,120,29,39,11,3,0,16,6,2,2,1,,,,,0
armstbo01,1871,1,FW1,NA,12,49,9,11,2,1,0,5,0,1,0,1,,,,,0
barkeal01,1871,1,RC1,NA,1,4,0,1,0,0,0,2,0,0,1,0,,,,,0
barnero01,1871,1,BS1,NA,31,157,66,63,10,9,0,34,11,6,13,1,,,,,1
barrebi01,1871,1,FW1,NA,1,5,1,1,1,0,0,1,0,0,0,0,,,,,0
@eric-czech
eric-czech / Top MLB hitters.ipynb
Last active June 13, 2022 00:03
Top MLB hitters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@eric-czech
eric-czech / clustered_heatmap_example.py
Last active December 2, 2024 12:59
Clustered heatmap example
def get_clustered_order(df: pd.DataFrame, metric: str='euclidean', method: str='average') -> pd.DataFrame:
from scipy.cluster.hierarchy import linkage, leaves_list
return leaves_list(linkage(df, metric=metric, method=method))
def get_clustered_dataframe(df: pd.DataFrame, fill_value: Any=None, **kwargs) -> pd.DataFrame:
dfs = df if fill_value is None else df.fillna(fill_value)
return df.iloc[get_clustered_order(dfs, **kwargs), get_clustered_order(dfs.T, **kwargs)]
import plotly.express as px
px.imshow(get_clustered_dataframe(df))
@eric-czech
eric-czech / ncbi_human_genes.csv
Created March 9, 2022 21:31
NCBI Human Gene List
We can't make this file beautiful and searchable because it's too large.
taxon_id,gene_id,gene_symbol
9606,109951028,A-GAMMA3'E
9606,1,A1BG
9606,503538,A1BG-AS1
9606,29974,A1CF
9606,2,A2M
9606,144571,A2M-AS1
9606,144568,A2ML1
9606,100874108,A2ML1-AS1
9606,106478979,A2ML1-AS2
@eric-czech
eric-czech / hgnc_gene_lookup.ipynb
Last active September 23, 2021 13:06
Google patents HGNC normalizations
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@eric-czech
eric-czech / zooma.py
Created August 9, 2021 19:12
Zooma API Query for EFO disease
import requests
from urllib.parse import quote_plus
def get_info(disease):
url_fmt = 'http://www.ebi.ac.uk/spot/zooma/v2/api/services/annotate?propertyValue={disease}&propertyType=disease&filter=ontologies:[efo]'
url = url_fmt.format(disease=quote_plus(disease))
print(url)
res = requests.get(url).json()
if len(res) == 0:
return None
@eric-czech
eric-czech / search_efo_ols.py
Created May 11, 2021 11:06
Search EFO OLS using disease query
def search_efo(disease):
import requests
from urllib.parse import quote_plus
res = requests.get(f"https://www.ebi.ac.uk/ols/api/select?q={quote_plus(disease)}&ontology=efo")
res = res.json()
docs = res['response']['docs']
if len(docs) == 0:
return None
return docs[0]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@eric-czech
eric-czech / script.R
Last active February 22, 2021 18:46
Code for figure 2 from Bellégo and Papeis 2019
library(ggplot2)
# For figure 2 of paper https://www.semanticscholar.org/paper/Dealing-with-Logs-and-Zeros-in-Regression-Models-Bellégo-Pape/546a6f45433b413721f9a60f0be8e3e2b69fe103
set.seed(1)
beta <- 1
x_max <- 1
x <- x_max * runif(10000)
y <- sapply(x, function(x) rpois(1, lambda=exp(beta * x + 1)))
df <- do.call(rbind, lapply(seq(.01, 1, by=.01), function(delta){
beta_hat <- unname(lm(log(y + delta) ~ x)$coefficients['x'])
@eric-czech
eric-czech / ols.py
Created January 20, 2021 20:31
OLS term search function
def get_ols(term):
import requests
url = 'http://www.ebi.ac.uk/ols/api/ontologies/efo/terms?obo_id=' + term
res = requests.get(url)
terms = res.json()['_embedded']['terms']
assert len(terms) == 1
return terms[0]