This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Keybase proof | |
I hereby claim: | |
* I am gdbassett on github. | |
* I am gdbassett (https://keybase.io/gdbassett) on keybase. | |
* I have a public key whose fingerprint is 8F47 6E59 65B3 9C92 428C 5A8C C609 81ED D4FA 1957 | |
To claim this, I am signing this object: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from scipy import stats as scistats | |
import numpy as np | |
# Implementation of Tau from http://amstat.tandfonline.com/doi/abs/10.1198/004017002188618509#.VDgKhdR4rEh | |
# blatently transposed R robustbase library from http://r-forge.r-project.org/scm/?group_id=59, OGK.R | |
def scaleTau2(x, c1 = 4.5, c2 = 3.0, consistency = True, mu_too = False, *xargs, **kargs): | |
## NOTA BENE: This is *NOT* consistency corrected | |
x = np.asarray(x) | |
n = len(x) | |
medx = np.median(x) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- encoding: utf-8 -*- | |
""" | |
AUTHOR: Gabriel Bassett | |
DATE: 11-19-2014 | |
DEPENDENCIES: py2neo | |
Copyright 2014 Gabriel Bassett | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.metrics.pairwise import pairwise_distances | |
import numpy as np | |
# X shoudl be a numpy matrix, very likely sparse matrix: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csr_matrix.html#scipy.sparse.csr_matrix | |
# T1 > T2 for overlapping clusters | |
# T1 = Distance to centroid point to not include in other clusters | |
# T2 = Distance to centroid point to include in cluster | |
# T1 > T2 for overlapping clusters | |
# T1 < T2 will have points which reside in no clusters | |
# T1 == T2 will cause all points to reside in mutually exclusive clusters |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- encoding: utf-8 -*- | |
# based on http://scikit-learn.org/stable/auto_examples/document_clustering.html | |
from sklearn.feature_extraction.text import TfidfVectorizer | |
from sklearn.cluster import KMeans, MiniBatchKMeans | |
from sklearn.metrics.pairwise import pairwise_distances | |
import numpy as np | |
from time import time | |
from collections import defaultdict |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> df <- df[!names(df) %in% c('root.victim.region', | |
+ 'root.victim.country', | |
+ 'root.summary', | |
+ 'root.summary=Source_Category', | |
+ 'root.victim.industry', | |
+ 'root.timeline.incident.year', | |
+ 'root.plus.dbir_year', | |
+ 'root.action.social.notes', | |
+ 'root.victim.secondary.notes', | |
+ 'root.action.hacking.notes', |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' @param df Dataframe with x and y columns. (Hopefully in the future this can be x) | |
#' @param nlines The number of clusters. | |
#' @param ab a dataframe with a 'slopes' and 'intercepts' column and one row per initial line. Dimensions must match nlines. | |
#' @param maxiter The maximum number of iterations to do | |
#' @export | |
#' @examples | |
linearKMeans <- function(df, ab=NULL, nlines=0, maxiter=1000) { | |
# default number of lines | |
nlines_default <- 5 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Test" | |
author: "Gabe" | |
date: "November 03, 2016" | |
output: html_document | |
params: | |
df: data.frame() | |
a: "" | |
b: "" | |
c: "FALSE" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
speedrun <- XML::xmlParse("/livesplit.lss") | |
speedrun <- XML::xmlToList(speedrun) | |
chunk <- do.call(rbind, lapply(speedrun[['Segments']], function(segments) { | |
segments.df <- do.call(rbind, lapply(segments[['SegmentHistory']], function(segment) { | |
if ('RealTime' %in% names(segment)) | |
data.frame(`attemptID` = segment$.attrs['id'], RealTime = segment$RealTime) | |
})) | |
segments.df$name <- rep(segments$Name, nrow(segments.df)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# pick an enumeration | |
enum <- "action.*.variety" | |
# establish filter criteria (easier than a complex standard-eval filter_ line) | |
df <- vcdb %>% | |
dplyr::filter(plus.dbir_year == 2016, subset.2017dbir) %>% | |
dplyr::filter(attribute.confidentiality.data_disclosure.Yes) %>% | |
dplyr::filter(victim.industry2.92) | |
# establish priors from previous year | |
priors <- df %>% |
OlderNewer