Skip to content

Instantly share code, notes, and snippets.

View gavinwhyte's full-sized avatar

Gavin Whyte gavinwhyte

View GitHub Profile
@gavinwhyte
gavinwhyte / upgradegit.txt
Created August 31, 2015 07:03
upgrading git in ubuntu
add the PPA to the local index:
>sudo add-apt-repository ppa:git-core/ppa
update the local repository index
>sudo apt-get update
and lastly install the git package.
>sudo apt-get install git
@gavinwhyte
gavinwhyte / requirements.txt
Created August 31, 2015 10:19
python requirements
beautifulsoup4==4.4.0
blosc==1.2.7
Bottleneck==1.0.0
funcsigs==0.4
google-api-python-client==1.4.1
html5lib==0.999999
httplib2==0.9.1
lxml==3.4.4
matplotlib==1.4.3
mock==1.1.3
Installing Homebrew
First, we need to install Homebrew. Homebrew allows us to install and compile software packages easily from source.
Homebrew comes with a very simple install script. When it asks you to install XCode CommandLine Tools, say yes.
Open Terminal and run the following command:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Installing Ruby
@gavinwhyte
gavinwhyte / rubyinstall.sh
Created August 31, 2015 10:25
ruby install
Installing Homebrew
First, we need to install Homebrew. Homebrew allows us to install and compile software packages easily from source.
Homebrew comes with a very simple install script.
When it asks you to install XCode CommandLine Tools, say yes.
Open Terminal and run the following command:
@gavinwhyte
gavinwhyte / knn.py
Created September 13, 2015 10:17
Knn
__author__ = 'gavinwhyte'
from numpy import *
import operator
import matplotlib
import matplotlib.pyplot as plt
def createDataSet():
group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
labels = ['A', 'A', 'B', 'B']
# After determining with attributes are categorical and which
# are numeric , you'll want descriptive stat for the numeric
# variables and a count of the unique categories in each
# categorical attribute
import urllib2
import sys
import numpy as np
@gavinwhyte
gavinwhyte / trips.py
Last active August 23, 2016 09:26
TripCount
from scipy.stats import poisson
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
x = np.fromfile('tripcountssample.txt',
@gavinwhyte
gavinwhyte / multipart.py
Created January 18, 2017 03:43
To Run file python multipart.py bucketname extremely_large_file.txt
#!/usr/bin/env python
import os, sys
import math
import boto
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''
def upload_file(s3, bucketname, file_path):
Principal component analysis (PCA) is a dimensionality reduction technique that is widely used in data analysis.
Reducing the dimensionality of a dataset can be useful in different ways. For example, our ability to visualize data is limited to 2 or 3 dimensions.
Lower dimension can sometimes significantly reduce the computational time of some numerical algorithms.
Besides, many statistical models suffer from high correlation between covariates, and PCA can be used to produce linear combinations of the covariates that are uncorrelated between each other.
Computing PCA
import java.util
import org.apache.kafka.clients.consumer.KafkaConsumer
import scala.collection.JavaConverters._
object ConsumerExample extends App {
import java.util.Properties