Skip to content

Instantly share code, notes, and snippets.

View dmarx's full-sized avatar

David Marx dmarx

View GitHub Profile
@dmarx
dmarx / find_best_cutoff.r
Created August 9, 2017 12:11
Demonstration of how to construct a bespoke regression to determine the optimal cutoff value for constructing a categorical variable for a logistic regression
# Finding best cut-off for constructing a categorical variable
# logistic regression
data(iris)
x0 = iris[iris$Species != 'setosa',]
plot(x0, col=x0$Species)
# Keep things simple for this demo
form = "is_virginica ~ Petal.Length + Petal.Width"
@dmarx
dmarx / mcglm.r
Last active July 3, 2017 19:27
Playing with `mcglm` for a multivariate poisson model. Not sure how to extract the inter-DV covariance.
# From the docs for mcglm::ahs
require(mcglm)
data(ahs, package="mcglm")
form1 <- Ndoc ~ income + age
form2 <- Nndoc ~ income + age
Z0 <- mc_id(ahs)
fit.ahs <- mcglm(linear_pred = c(form1, form2),
matrix_pred = list(Z0, Z0), link = c("log","log"),
@dmarx
dmarx / demo.r
Last active June 28, 2017 23:37
Demonstration of a method for evaluating the performance of a poisson regression by calculating the bootstrapped accuracy subject to a range of error thresholds
##########################################################
# Get data from the poisson demo at: #
# https://stats.idre.ucla.edu/r/dae/poisson-regression/ #
##########################################################
p <- read.csv("https://stats.idre.ucla.edu/stat/data/poisson_sim.csv")
p <- within(p, {
prog <- factor(prog, levels=1:3, labels=c("General", "Academic",
"Vocational"))
id <- factor(id)
@dmarx
dmarx / Makefile
Created June 21, 2017 20:31
Minimal working example for stackoverlfow question
DIRS := $(filter dir%, $(shell ls))
foo_sources := $(wildcard */source/foo.a)
foo_targets_prt := $(patsubst %.a, %.b, $(foo_sources))
foo_targets := $(subst source,target, $(foo_targets_prt))
bar_sources := $(wildcard */source/bar.a)
bar_x := $(patsubst %/bar.a, %/Y.a, $(bar_sources))
bar_y := $(patsubst %/bar.a, %/Z.a, $(bar_sources))
bar_targets := $(bar_x) $(bar_y)
@dmarx
dmarx / simple regression to measure effect of a regime change.r
Last active June 6, 2017 01:12
Simple regression with interaction terms to measure effect of a regime change on the predictors. Implementation of https://stats.stackexchange.com/a/99432/8451
#' ---
#' title: "Regression for quantifying a regime change"
#' author: "David Marx"
#' date: "June 5, 2017"
#' output: html_document
#' ---
#' There are two time points of interest. We want to test the hypothesis that the regression
#' coefficients changed after these time points, respectively. We will accomplish this by introducing
#' dummy variables to denote whether we are before or after a particular change point. This approach
@dmarx
dmarx / Arxiv Archive.md
Last active April 18, 2019 23:03
Machine learning articles I want to read or have read, mostly arxiv.org articles discussing recent advancements in deep learning.

To Read:

Publication Date Article Notes
2016 End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures Cited in multi-task sciERC (2018, below)
2018-10-11 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction Probably a lot of useful citations in here, not sure we need the coreference stuff.
* SciERC datasets: http://nlp.cs.washington.edu/sciIE/
* Code: https://bitbucket.org/luanyi/scierc/src/master/
* Pretrained (best) models: NER, Coref, Relation
2017-08-08 [Structural
@dmarx
dmarx / chinese restuarant process.R
Created April 3, 2017 22:56
Demonstration of a Chinese Restaurant Process, with an optional parameter to push the tables towards a uniform distribution rather than dirichlet (i.e. preferential attachment)
# chinese restuarant process
chinese_restaurant = function(n, uniform=FALSE){
tables = c(1) # running counts of people at tables. Start by seating first person at their own table
U = runif(n)
for (i in 2:n){
if(U[i]<1/i){
tables = c(tables, 1)
} else {
p = tables/(i) # sum(tables) = i-1
@dmarx
dmarx / edge_weight_null_distribution.r
Created January 30, 2017 01:37
Simulate null hypothesis distribution for Serrano's disparity filter
generate_distances = function(k){
u_k = c(0,sort(runif(k-1)),1)
u_k[-1] - u_k[-(k+1)]
}
iters=1e4
d = c(replicate(iters, generate_distances(2)))
plot(density(d), ylim=c(0,5))
#abline(v=mean(d), lty=2)
@dmarx
dmarx / disparity_filter_dt.r
Last active December 19, 2017 12:00
Modified Alessandro Bessi's r implementation of Serrano's Disparity Filter to utilize the data.table package, imbuing orders of magnitude performance gains on calculation time (1.3 seconds for 543k nodes). Need to turn into a pull request or package fork. Original code: https://github.com/alessandrobessi/disparityfilter
#' Extract the backbone of a weighted network using the disparity filter
#'
#' Given a weighted graph, \code{backbone} identifies the 'backbone structure'
#' of the graph, using the disparity filter algorithm by Serrano et al. (2009).
#' @param graph The input graph.
#' @param weights A numeric vector of edge weights, which defaults to
#' \code{E(graph)$weight}.
#' @param directed The directedness of the graph, which defaults to the result
#' of \code{\link[igraph]{is_directed}}.
#' @param alpha The significance level under which to preserve the edges, which
@dmarx
dmarx / venn_intersection_text.R
Last active January 12, 2017 22:15
Rough method for drawing labels in intersections of a venn diagram drawn using R's `venneueler` package
#install.packages('venneuler')
library(venneuler)
venn_intersection_text = function(venn, classes, label, adjustment=0.5, xadj=0, yadj=0 ){
# fits a line between the centers of two classes and draws label text at the midpoint of that line + adjustment
xv = adjustment*venn$centers[classes[1],1] + (1-adjustment)*venn$centers[classes[2],1] + xadj
yv = adjustment*venn$centers[classes[1],2] + (1-adjustment)*venn$centers[classes[2],2] + yadj
text(x=xv, y=yv, labels=label)
}