Skip to content

Instantly share code, notes, and snippets.

View gavinwhyte's full-sized avatar

Gavin Whyte gavinwhyte

View GitHub Profile
object ProducerExample extends App {
import java.util.Properties
import org.apache.kafka.clients.producer._
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
import java.util
import org.apache.kafka.clients.consumer.KafkaConsumer
import scala.collection.JavaConverters._
object ConsumerExample extends App {
import java.util.Properties
Principal component analysis (PCA) is a dimensionality reduction technique that is widely used in data analysis.
Reducing the dimensionality of a dataset can be useful in different ways. For example, our ability to visualize data is limited to 2 or 3 dimensions.
Lower dimension can sometimes significantly reduce the computational time of some numerical algorithms.
Besides, many statistical models suffer from high correlation between covariates, and PCA can be used to produce linear combinations of the covariates that are uncorrelated between each other.
Computing PCA
@gavinwhyte
gavinwhyte / multipart.py
Created January 18, 2017 03:43
To Run file python multipart.py bucketname extremely_large_file.txt
#!/usr/bin/env python
import os, sys
import math
import boto
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''
def upload_file(s3, bucketname, file_path):
@gavinwhyte
gavinwhyte / trips.py
Last active August 23, 2016 09:26
TripCount
from scipy.stats import poisson
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
x = np.fromfile('tripcountssample.txt',
# After determining with attributes are categorical and which
# are numeric , you'll want descriptive stat for the numeric
# variables and a count of the unique categories in each
# categorical attribute
import urllib2
import sys
import numpy as np
@gavinwhyte
gavinwhyte / knn.py
Created September 13, 2015 10:17
Knn
__author__ = 'gavinwhyte'
from numpy import *
import operator
import matplotlib
import matplotlib.pyplot as plt
def createDataSet():
group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
labels = ['A', 'A', 'B', 'B']
@gavinwhyte
gavinwhyte / rubyinstall.sh
Created August 31, 2015 10:25
ruby install
Installing Homebrew
First, we need to install Homebrew. Homebrew allows us to install and compile software packages easily from source.
Homebrew comes with a very simple install script.
When it asks you to install XCode CommandLine Tools, say yes.
Open Terminal and run the following command:
Installing Homebrew
First, we need to install Homebrew. Homebrew allows us to install and compile software packages easily from source.
Homebrew comes with a very simple install script. When it asks you to install XCode CommandLine Tools, say yes.
Open Terminal and run the following command:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Installing Ruby
@gavinwhyte
gavinwhyte / requirements.txt
Created August 31, 2015 10:19
python requirements
beautifulsoup4==4.4.0
blosc==1.2.7
Bottleneck==1.0.0
funcsigs==0.4
google-api-python-client==1.4.1
html5lib==0.999999
httplib2==0.9.1
lxml==3.4.4
matplotlib==1.4.3
mock==1.1.3