Skip to content

Instantly share code, notes, and snippets.

View johnlees's full-sized avatar

John Lees johnlees

View GitHub Profile
jb_pubdistance <- function(
v_pubs=c("The Cambridge Brew House","The Cambridge Blue",
"King Street Run P.H.","The Elm Tree","The Regal"),
v_location="Cambridge, UK",
v_zoom=13,
listpubs=FALSE,
cam_pubs=TRUE,
crow_distances=FALSE,
units="minutes"){
@johnlees
johnlees / firth_regression.py
Last active May 9, 2023 19:11
Firth regression in python
#!/usr/bin/env python
'''Python implementation of Firth regression by John Lees
See https://www.ncbi.nlm.nih.gov/pubmed/12758140'''
def firth_likelihood(beta, logit):
return -(logit.loglike(beta) + 0.5*np.log(np.linalg.det(-logit.hessian(beta))))
# Do firth regression
# Note information = -hessian, for some reason available but not implemented in statsmodels
def fit_firth(y, X, start_vec=None, step_limit=1000, convergence_limit=0.0001):
@johnlees
johnlees / pneumo_wgcna.R
Last active April 24, 2018 15:16
Gene expression modules for S. pneumoniae from PneumoExpress co-expression matrix
require(WGCNA)
# See:
# https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-02-networkConstr-man.pdf
SPV_2_name = read.delim("SPV_2_name.txt", header=FALSE)
colnames(SPV_2_name) = c("SPV", "gene")
# co-expression matrix from Veening lab
# https://www.biorxiv.org/content/early/2018/03/22/283739
@johnlees
johnlees / WGCNA_modules.tsv
Last active April 24, 2018 14:53
WGCNA modules for S. pneumoniae expression
locus_id module_number gene_name
SPV_0086 NA SPV_0086
SPV_0480 NA SPV_0480
SPV_0481 NA SPV_0481
SPV_0982 NA SPV_0982
SPV_1013 NA SPV_1013
SPV_1364 NA SPV_1364
SPV_1793 NA SPV_1793
SPV_2310 NA SPV_2310
SPV_1772 NA acyP
@johnlees
johnlees / all_distances.csv
Created May 16, 2018 17:16
Distances of core gene trees from core genome tree
CLS Gene KC distance (lambda = 0) KC distance (lambda = 1)
CLS00002 adcB 7960.19308811036 7.36117345073605
CLS00003 adcC 5691.7166127628 2.62408261905232
CLS00004 adcR 68944.9760968847 1.01831498716277
CLS00005 dltD 6031.18935202668 8.28319801733685
CLS00006 dltC 85402.9472149527 1.40643165294022
CLS00007 dltB 6084.44787963542 7.20462600697612
CLS00013 glpF 4540.60722811388 5.18730142556587
CLS00014 glpO 2720.14006992287 10.7387600382768
CLS00022 phoB 22900.4934226317 41.118579712978
@johnlees
johnlees / core_gene_stats.csv
Last active April 17, 2019 19:47
Statistics for core genes in S. pneumoniae
We can make this file beautiful and searchable if this error is corrected: It looks like row 10 should actually have 11 columns, instead of 5 in line 9.
COG,Gene name,Average_aa_length,Aligned_sites,dS,dN,omega,Sites,pi_aa,dN_rate,pi_gene
CLS00002,adcB,268,268,56.5,10.5,0.0656766113998045,NA,0.000751449436378113,0.0391791044776119,0.201388448949334
CLS00003,adcC,234,234,50.5,14.5,0.134629916080151,NA,0.00391277647376286,0.061965811965812,0.915589694860509
CLS00004,adcR,146.941558441558,166,11,4,0.188155217627637,NA,0.00141473858168382,0.0240963855421687,0.207883891980019
CLS00005,dltD,420.982142857143,425,65,80,0.45168151934188,NA,0.00272578563815855,0.188235294117647,1.14750707892121
CLS00006,dltC,79,79,5,4,0.561513140654798,NA,0.0012429013818335,0.0506329113924051,0.0981892091648464
CLS00007,dltB,413.631493506493,422,84,84,0.429553902002596,NA,0.00524836591836688,0.199052132701422,2.17088943328267
CLS00013,glpF,233.165584415584,234,102,32,0.143771320439321,NA,0.0118706308630385,0.136752136752137,2.76782258256203
CLS00014,glpO,607.709415584416,608,413.75,128,0.164681667531972,NA,0.00827269833662607,0.210526315789474,5.0273966714572
CLS00022,phoB,217,217,86.6
@johnlees
johnlees / matrix_pow.R
Created January 6, 2021 11:03
Matrix power in base R using exponentiation by squaring
matrix_pow <- function(x, n) {
if (abs(n - round(n)) < .Machine$double.eps^0.5) {
stop(sprintf("Power %.1f is not an integer", n))
}
if (n < 0) {
return(matrix_exp(solve(x), -n))
} else if (n == 0) {
return(1)
} else if (n == 1) {
return(x)
@johnlees
johnlees / pseudowords.py
Last active February 12, 2021 12:43
Pseudoword generator
import json
import random
import string
# Download from https://github.com/dwyl/english-words/raw/master/words_dictionary.json
with open('words_dictionary.json', 'r') as word_list:
real_words = json.load(word_list)
vowels = ["a", "e", "i", "o", "u"]
trouble = ["q", "x", "y"]
@johnlees
johnlees / stderr.txt
Created July 19, 2021 14:14
hhblits stderr
I0719 14:43:06.996543 139929070653184 run_docker.py:180] E0719 13:43:06.994233 139663816972096 hhblits.py:141] - 11:45:50.511 INFO: Searching 65983866 column state sequences.
I0719 14:43:06.996851 139929070653184 run_docker.py:180] E0719 13:43:06.994395 139663816972096 hhblits.py:141] - 11:45:51.265 INFO: Searching 15161831 column state sequences.
I0719 14:43:06.997076 139929070653184 run_docker.py:180] E0719 13:43:06.994539 139663816972096 hhblits.py:141] - 11:45:51.326 INFO: /mnt/fasta_path_0/dltA.fa is in A2M, A3M or FASTA format
I0719 14:43:06.997292 139929070653184 run_docker.py:180] E0719 13:43:06.994683 139663816972096 hhblits.py:141] - 11:45:51.327 INFO: Iteration 1
I0719 14:43:06.997498 139929070653184 run_docker.py:180] E0719 13:43:06.994824 139663816972096 hhblits.py:141] - 11:45:51.559 INFO: Prefiltering database
I0719 14:43:06.997703 139929070653184 run_docker.py:180] E0719 13:43:06.994960 139663816972096 hhblits.py:141] - 11:55:13.366 INFO: HMMs passed 1st prefilter (gapless profile-profile alig
@johnlees
johnlees / 0001-Update-CUDA-version.patch
Created July 20, 2021 14:45
Git patch for CUDA 11.1
From 95ea1abb942e8da86faa1f1856fed316dcd95985 Mon Sep 17 00:00:00 2001
From: John Lees <[email protected]>
Date: Tue, 20 Jul 2021 15:44:19 +0100
Subject: [PATCH] Update CUDA version
---
docker/Dockerfile | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/docker/Dockerfile b/docker/Dockerfile