Wesley Tansey tansey

tansey / median_deviations.py

Created July 13, 2022 11:02

code to cap values at median deviations

	import numpy as np
	import numpy.ma as ma


	def cap_outliers(points, thresh=3.5, data=None, median=None, med_abs_deviation=None):
	'''
	Cap outliers to be within a certain number of median deviations.
	'''
	if type(points) is np.float64:
	points = np.array([points])

tansey / pav.py

Created April 5, 2022 16:26

1d and 2d pool adjacent violators (PAV)

	import numpy as np

	def pav(y):
	"""
	PAV uses the pair adjacent violators method to produce a monotonic
	smoothing of y

	translated from matlab by Sean Collins (2006) as part of the EMAP toolbox
	Author : Alexandre Gramfort
	license : BSD

tansey / aicc_select.py

Created July 6, 2021 01:10

Post-selection inference example for AICc-based model screening

	import matplotlib.pyplot as plt
	import numpy as np
	import statsmodels.api as sm


	def generalized_liang_sim_xy(N=500, P=500, S=100):
	'''Generates data from a simple linear model'''
	X = (np.random.normal(size=(N,1)) + np.random.normal(size=(N,P))) / 2.
	w0 = np.random.normal(1, size=S//4)
	w1 = np.random.normal(2, size=S//4)

tansey / binary_matrix_factorization.py

Last active June 1, 2020 21:55

Quick and dirty binary matrix factorization via alternating logistic regression

	import numpy as np
	from functools import partial
	from scipy.optimize import fmin_l_bfgs_b
	from sklearn.linear_model import LogisticRegression


	def binary_mf(Y, nembeds, lam=None, lams=30, cv=5, max_steps=30, tol=1e-4, verbose=False):
	# Convert to a log-space grid
	if lam is None and isinstance(lams, int):
	lams = np.exp(np.linspace(np.log(1e-2), np.log(1), lams))

tansey / knockoffs.py

Last active March 28, 2020 15:20

A fast (nlogn time) implementation of the knockoff filter

	'''
	A O(nlogn) time implementation of the knockoff filter.

	Author: Wesley Tansey
	Date: 3/27/2020
	'''
	import numpy as np

	def knockoff_filter(knockoff_stats, alpha, offset=1.0, is_sorted=False):
	'''Perform the knockoffs selection procedure at the target FDR threshold.

tansey / multifactor.py

Created August 9, 2019 18:14

Heterogeneous (AKA multi-view) factor modeling in pytorch.

	'''
	Heterogeneous factor modeling.

	This model fits a heterogeneous factor model where columns may be:
	1) Binary
	2) Categorical
	3) Gaussian

	Everything is fit via alternating minimization and stochastic gradient descent.
	The code relies on pytorch for SGD and a demo is included.

tansey / factor_pav.py

Last active May 11, 2019 18:38

Pool adjacent violators algorithm for monotone matrix factorization

	'''Pool adjacent violators algorithm for (column-)monotone matrix factorization.

	Applies the PAV algorithm to column factors of a matrix factorization:
	Given: M = W.V'
	Returns: V_proj, a projected version of V such that M[i] is monotone decreasing
	for all i.

	Author: Wesley Tansey
	Date: May 2019
	'''

tansey / fast_mvn.py

Last active April 29, 2022 03:32

Fast multivariate normal sampling for some common cases

	'''Fast sampling from a multivariate normal with covariance or precision
	parameterization. Supports sparse arrays. Params:
	- mu: If provided, assumes the model is N(mu, Q)
	- mu_part: If provided, assumes the model is N(Q mu_part, Q).
	This is common in many conjugate Gibbs steps.
	- sparse: If true, assumes we are working with a sparse Q
	- precision: If true, assumes Q is a precision matrix (inverse covariance)
	- chol_factor: If true, assumes Q is a (lower triangular) Cholesky
	decomposition of the covariance matrix
	(or of the precision matrix if precision=True).

tansey / nurse_schedules.py

Last active April 19, 2021 15:18

Solver for a nurse scheduling problem

	'''
	Program to generate valid time allocations of a mental ward staff.


	Given:

	Two staff lists. Each list applies for a specific window of time. Lists may
	contain non-empty intersections of employees.

	Each employee has a designation as RMN or HCA.

tansey / fitWeightedNegativeBinomial.R

Created July 2, 2017 16:18

Fit negative binomial with weighted observations in R

	# Fit using a simple EM algorithm
	# observations are x
	# weights are w (must be same length as x)
	# returns (r, p)
	# r - dispersion parameter
	# p - probability of success
	weightedNegBinomFit <- function(x, w, maxsteps=30)
	{
	sum.wx = sum(x*w)
	sum.w = sum(w)