Skip to content

Instantly share code, notes, and snippets.

View rawkintrevo's full-sized avatar

Trevor Grant rawkintrevo

View GitHub Profile
@rawkintrevo
rawkintrevo / CCO For Music Based Dating App
Last active April 6, 2017 04:04
Holy shit this took a lot of recon by fire.
/**
* Created by rawkintrevo on 4/5/17.
*/
// Only need these to intelliJ doesn't whine
import org.apache.mahout.drivers.ItemSimilarityDriver.parser
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
/**
* Created by rawkintrevo on 4/5/17.
*/
// Only need these to intelliJ doesn't whine
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
/**
* Created by rawkintrevo on 4/5/17.
*/
// Only need these so intelliJ doesn't complain
import org.apache.mahout.math._
import org.apache.mahout.math.scalabindings._
import org.apache.mahout.math.drm._
import org.apache.mahout.math.scalabindings.RLikeOps._
import org.apache.mahout.math.drm.RLikeDrmOps._
@rawkintrevo
rawkintrevo / zeppelin-flink-spark.sh
Created July 15, 2016 20:58
Setup Zepplin+Flink+Spark
@rawkintrevo
rawkintrevo / SparkAverageTemps.Scala
Created November 7, 2015 14:20
Download NOAA Weather Data, then Compute the average high temperature for each station
// For parsing Stations
// ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt
// For parsing Observations
// ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/readme.txt
case class Observation(station_id: String, date: String, observation_type: String, observation_value: Float, observation_time: String)
val obsText = sc.textFile("ftp://anonymous:[email protected]/pub/data/ghcn/daily/by_year/1768.csv.gz")
val observations = obsText.map(s=>s.split(","))map(s=> Observation(s(0), s(1), s(2), s(3).toFloat, s(4)))
val avgTemps = observations.filter(s => s.observation_type == "TMAX").map(s => (s.station_id, (s.observation_value,1))).reduceByKey((running,next_val) => (running._1 + (next_val._1-running._1)/ (running._2 + 1), running._2 + next_val._2))
#!/bin/bash
sudo apt-get install git openssh-server openjdk-7-jdk openjdk-7-doc openjdk-7-jre-lib
sudo apt-get purge maven maven2
wget "http://www.us.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz"
tar -zxvf apache-maven-3.3.3-bin.tar.gz
sudo mv ./apache-maven-3.3.3 /usr/local
sudo ln -s /usr/local/apache-maven-3.3.3/bin/mvn /usr/bin/mvn
git clone https://github.com/tillrohrmann/incubator-zeppelin.git
@rawkintrevo
rawkintrevo / SparkZeppelinWordCount
Last active April 11, 2019 04:06
A Spark Word Count Example for Zeppelin
%spark // let Zeppelin know what interpretter to use.
/*
Written by Trevor Grant 10/22/2015
Inspired by word count example at: http://spark.apache.org/examples.html
*/
val text = sc.parallelize(List("In the time of chimpanzees, I was a monkey", // some lines of text to analyze
"Butane in my veins and I'm out to cut the junkie",
"With the plastic eyeballs, spray paint the vegetables",
@rawkintrevo
rawkintrevo / FlinkZeppelinWordCount
Last active May 9, 2016 09:27
A Flink Word Count Example for Zeppelin
%flink // let Zeppelin know what interpretter to use.
val text = env.fromElements("In the time of chimpanzees, I was a monkey", // some lines of text to analyze
"Butane in my veins and I'm out to cut the junkie",
"With the plastic eyeballs, spray paint the vegetables",
"Dog food stalls with the beefcake pantyhose",
"Kill the headlights and put it in neutral",
"Stock car flamin' with a loser in the cruise control",
"Baby's in Reno with the Vitamin D",
"Got a couple of couches, sleep on the love seat",
@rawkintrevo
rawkintrevo / gist:77338f3c25e6bb973d6e
Created March 11, 2015 01:55
Multivariate Linear Hierachical -pymc
import pymc as pm
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint
import pandas as pd
def linear_setup(df, ind_cols, dep_col, gb_cols, intercept=True):
'''
N: Number of observations
G: Number of groups
@rawkintrevo
rawkintrevo / gist:31cb13fc017a723ccf33
Created March 11, 2015 01:54
Single Variate Hierarchical- PyMC
"""
Bayesian Statistics and Marketing: Rossi, Allenby, McCullough
Section 3.7
This is a remarkably poorly defined model. Almost better off going
straight for the deep wizardry on his code than dicking with the
text. (At least for model definition, because the code is crap too.)
D: