Skip to content

Instantly share code, notes, and snippets.

View ryanswanstrom's full-sized avatar
🏠
South Dakota

Ryan Swanstrom ryanswanstrom

🏠
South Dakota
View GitHub Profile
@ryanswanstrom
ryanswanstrom / datasciencejobs_results.txt
Created March 29, 2012 03:21
The results of the words counted from the data scientist job postings.
'visibility' occured 1 times and in 1 job descriptions
'external' occured 1 times and in 1 job descriptions
'particular' occured 1 times and in 1 job descriptions
'party' occured 1 times and in 1 job descriptions
'prototyping' occured 1 times and in 1 job descriptions
'semantics' occured 1 times and in 1 job descriptions
'tens' occured 1 times and in 1 job descriptions
'salary' occured 1 times and in 1 job descriptions
'else' occured 1 times and in 1 job descriptions
'essential' occured 1 times and in 1 job descriptions
@ryanswanstrom
ryanswanstrom / CountWords.py
Created April 4, 2012 14:33
This code will open a file and count the words. Note, this is mostly just psuedo-code. It is not complete and has not been tested.
f=open('/tmp/file.dat')
words = {}
for line in f:
print line
line_words = line.split() // returns an array of words separated by whitespace
for word in line_words:
// get the value
val = words.get(word, false)
if val:
val++;
@ryanswanstrom
ryanswanstrom / log_regression.py
Created May 23, 2012 03:15
A simple logistic regression solution to the Kaggle Biological Response Competition
#!/usr/bin/env python
from sklearn.linear_model import LogisticRegression
import csv_io
import math
import scipy
def main():
#read in the training file
@ryanswanstrom
ryanswanstrom / random_forest.py
Created May 31, 2012 03:38
Python code to run a random forest against the biological response Kaggle competition
#!/usr/bin/env python
from sklearn.ensemble import RandomForestClassifier
import csv_io
import scipy
def main():
#read in the training file
train = csv_io.read_data("train.csv")
@ryanswanstrom
ryanswanstrom / bioresponse.py
Created June 11, 2012 03:22
Used for the Kaggle Bioresponse competition, mostly taken from the Kaggle Wiki
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn import cross_validation
import csv_io as csv
import llfun as logloss
import numpy as np
def main():
#read in data, parse into training and target sets
@ryanswanstrom
ryanswanstrom / first_ipython.ipynb
Created July 22, 2013 20:43
My first attempt at ipython
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ryanswanstrom
ryanswanstrom / homework4.ipynb
Created November 1, 2013 18:04
A file plotting a 3d plane with ipythong
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ryanswanstrom
ryanswanstrom / defect_data.ipynb
Created December 29, 2013 15:12
generates a data file with defect data
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ryanswanstrom
ryanswanstrom / supercorp_defect_1.csv
Created December 29, 2013 15:14
An example of a defect file complete data with 5 applications, each released every month for 14 straight months
AppID AppName ReleaseDate DevHours TestHours SITDefects UATDefects ProdDefects ReportDate
app1 Application 1 2012-10-02 57 0 29 11 3 2012-10-24
app2 Application 2 2012-10-02 198 0 88 17 13 2012-10-24
app3 Application 3 2012-10-02 354 0 134 25 17 2012-10-24
app4 Application 4 2012-10-02 358 0 117 36 20 2012-10-24
app5 Application 5 2012-10-02 695 0 227 55 32 2012-10-24
app1 Application 1 2012-11-02 109 0 25 16 1 2012-11-24
app2 Application 2 2012-11-02 213 0 53 17 13 2012-11-24
app3 Application 3 2012-11-02 412 0 158 43 15 2012-11-24
app4 Application 4 2012-11-02 357 0 123 35 18 2012-11-24