Skip to content

Instantly share code, notes, and snippets.

View rnowling's full-sized avatar

RJ Nowling rnowling

View GitHub Profile
WARN [2015-03-19 16:31:26,277] ({main} ZeppelinConfiguration.java[create]:76) - Failed to load configuration, proceeding with a default
INFO [2015-03-19 16:31:26,493] ({main} NotebookServer.java[creatingwebSocketServerLog]:41) - Create zeppelin websocket on port 8081
INFO [2015-03-19 16:31:26,907] ({main} ZeppelinServer.java[main]:84) - Start zeppelin server
INFO [2015-03-19 16:31:26,910] ({main} Server.java[doStart]:272) - jetty-8.1.14.v20131031
INFO [2015-03-19 16:31:27,610] ({main} InterpreterFactory.java[init]:86) - Reading /home/vagrant/zeppelin/interpreter/spark
INFO [2015-03-19 16:31:28,326] ({main} InterpreterFactory.java[init]:103) - Interpreter spark found. class=com.nflabs.zeppelin.spark.SparkInterpreter
INFO [2015-03-19 16:31:28,333] ({main} InterpreterFactory.java[init]:103) - Interpreter pyspark found. class=com.nflabs.zeppelin.spark.PySparkInterpreter
INFO [2015-03-19 16:31:28,335] ({main} InterpreterFactory.java[init]:103) - Interpreter sql found. class=com.nflabs.zeppelin.spark.SparkSq
@rnowling
rnowling / gluster-tasks.yml
Last active August 29, 2015 14:19
Ansible Playbooks for Gluster
- hosts: storage_nodes
name: Gluster configuration
sudo: true
vars:
- gluster_brick_dirs:
- /srv/gluster/brick1
- /srv/gluster/brick2
- /srv/gluster/brick3
- /srv/gluster/brick4
- /srv/gluster/brick5
@rnowling
rnowling / rf_bias.py
Last active June 10, 2019 10:33
Simulate RF Categorical Variable Encoding Bias
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
N_SAMPLES = 1000
N_TREES = 100
MAX_CATEGORIES = 32
@rnowling
rnowling / rf_correlation_bias.py
Created August 12, 2015 02:17
RF Feature Correlation Bias
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
N_SAMPLES = 10
N_TREES = 100
MAX_CATEGORIES = 32
@rnowling
rnowling / optics.py
Created October 13, 2015 14:36
Customer Segmentation Pipeline Prototype
"""
Copyright 2015 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
@rnowling
rnowling / rf_missing_data.py
Created December 16, 2015 01:45
Imputing Missing Data and Random Forest Variable Importance Scores
from collections import defaultdict
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import mstats
N_SAMPLES = 100
@rnowling
rnowling / imbalanced_dataset_lr_comparison.py
Created August 27, 2016 06:25
Imbalanced Dataset Logistic Regression Model Comparison
"""
Script for comparing Logistic Regression and associated evaluation metrics on the imbalanced Media 6 Degrees dataset from the Doing Data Science book. You'll need to download a copy of the dataset from the GitHub repo: https://github.com/oreillymedia/doing_data_science .
Copyright 2016 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
Script for comparing Logistic Regression with L1, L2, and elastic net regularization and the liblinear, sag, and sgd optimizers. You'll need to download a copy of the dataset from http://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html .
Copyright 2016 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
Script for comparing spam classification with a bag-of-words model constructed with and without hashing. You'll need to download a copy of the dataset from http://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html .
Copyright 2016 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
@rnowling
rnowling / test_import.bats
Last active February 5, 2017 03:52
Test example for Bats
#!/usr/bin/env bats
count_snps() {
local counts=`python -c "import cPickle; data=cPickle.load(open('${1}/snp_feature_indices')); print len(data)"`
echo "$counts"
}
setup() {
N_INDIVIDUALS=20
N_SNPS=10000