Skip to content

Instantly share code, notes, and snippets.

View tansey's full-sized avatar

Wesley Tansey tansey

View GitHub Profile
@tansey
tansey / median_deviations.py
Created July 13, 2022 11:02
code to cap values at median deviations
import numpy as np
import numpy.ma as ma
def cap_outliers(points, thresh=3.5, data=None, median=None, med_abs_deviation=None):
'''
Cap outliers to be within a certain number of median deviations.
'''
if type(points) is np.float64:
points = np.array([points])
@tansey
tansey / pav.py
Created April 5, 2022 16:26
1d and 2d pool adjacent violators (PAV)
import numpy as np
def pav(y):
"""
PAV uses the pair adjacent violators method to produce a monotonic
smoothing of y
translated from matlab by Sean Collins (2006) as part of the EMAP toolbox
Author : Alexandre Gramfort
license : BSD
@tansey
tansey / aicc_select.py
Created July 6, 2021 01:10
Post-selection inference example for AICc-based model screening
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
def generalized_liang_sim_xy(N=500, P=500, S=100):
'''Generates data from a simple linear model'''
X = (np.random.normal(size=(N,1)) + np.random.normal(size=(N,P))) / 2.
w0 = np.random.normal(1, size=S//4)
w1 = np.random.normal(2, size=S//4)
@tansey
tansey / binary_matrix_factorization.py
Last active June 1, 2020 21:55
Quick and dirty binary matrix factorization via alternating logistic regression
import numpy as np
from functools import partial
from scipy.optimize import fmin_l_bfgs_b
from sklearn.linear_model import LogisticRegression
def binary_mf(Y, nembeds, lam=None, lams=30, cv=5, max_steps=30, tol=1e-4, verbose=False):
# Convert to a log-space grid
if lam is None and isinstance(lams, int):
lams = np.exp(np.linspace(np.log(1e-2), np.log(1), lams))
@tansey
tansey / knockoffs.py
Last active March 28, 2020 15:20
A fast (nlogn time) implementation of the knockoff filter
'''
A O(nlogn) time implementation of the knockoff filter.
Author: Wesley Tansey
Date: 3/27/2020
'''
import numpy as np
def knockoff_filter(knockoff_stats, alpha, offset=1.0, is_sorted=False):
'''Perform the knockoffs selection procedure at the target FDR threshold.
@tansey
tansey / multifactor.py
Created August 9, 2019 18:14
Heterogeneous (AKA multi-view) factor modeling in pytorch.
'''
Heterogeneous factor modeling.
This model fits a heterogeneous factor model where columns may be:
1) Binary
2) Categorical
3) Gaussian
Everything is fit via alternating minimization and stochastic gradient descent.
The code relies on pytorch for SGD and a demo is included.
@tansey
tansey / factor_pav.py
Last active May 11, 2019 18:38
Pool adjacent violators algorithm for monotone matrix factorization
'''Pool adjacent violators algorithm for (column-)monotone matrix factorization.
Applies the PAV algorithm to column factors of a matrix factorization:
Given: M = W.V'
Returns: V_proj, a projected version of V such that M[i] is monotone decreasing
for all i.
Author: Wesley Tansey
Date: May 2019
'''
@tansey
tansey / fast_mvn.py
Last active April 29, 2022 03:32
Fast multivariate normal sampling for some common cases
'''Fast sampling from a multivariate normal with covariance or precision
parameterization. Supports sparse arrays. Params:
- mu: If provided, assumes the model is N(mu, Q)
- mu_part: If provided, assumes the model is N(Q mu_part, Q).
This is common in many conjugate Gibbs steps.
- sparse: If true, assumes we are working with a sparse Q
- precision: If true, assumes Q is a precision matrix (inverse covariance)
- chol_factor: If true, assumes Q is a (lower triangular) Cholesky
decomposition of the covariance matrix
(or of the precision matrix if precision=True).
@tansey
tansey / nurse_schedules.py
Last active April 19, 2021 15:18
Solver for a nurse scheduling problem
'''
Program to generate valid time allocations of a mental ward staff.
Given:
Two staff lists. Each list applies for a specific window of time. Lists may
contain non-empty intersections of employees.
Each employee has a designation as RMN or HCA.
@tansey
tansey / fitWeightedNegativeBinomial.R
Created July 2, 2017 16:18
Fit negative binomial with weighted observations in R
# Fit using a simple EM algorithm
# observations are x
# weights are w (must be same length as x)
# returns (r, p)
# r - dispersion parameter
# p - probability of success
weightedNegBinomFit <- function(x, w, maxsteps=30)
{
sum.wx = sum(x*w)
sum.w = sum(w)