Skip to content

Instantly share code, notes, and snippets.

View mGalarnyk's full-sized avatar

Michael Galarnyk mGalarnyk

View GitHub Profile
@mGalarnyk
mGalarnyk / pollutantmean.R
Last active January 2, 2017 16:11
pollutantmean.R This file is used for the John Hopkins Data Science Specialization (R Programming). This file is posted for the blog post reviewing the specialization https://medium.com/@GalarnykMichael/in-progress-review-course-2-r-programming-jhu-coursera-ad27086d8438#.ui3hb8n46
pollutantmean <- function(directory, pollutant, id = 1:332) {
# Format number with fixed width and then append .csv to number
fileNames <- paste0(directory, '/', formatC(id, width=3, flag="0"), ".csv" )
# Reading in all files and making a large data.table
lst <- lapply(fileNames, data.table::fread)
dt <- rbindlist(lst)
if (c(pollutant) %in% names(dt)){
@mGalarnyk
mGalarnyk / complete.R
Last active January 2, 2017 16:25
complete.R This file is used for the John Hopkins Data Science Specialization (R Programming). This file is posted for the blog post reviewing the specialization https://medium.com/@GalarnykMichael/in-progress-review-course-2-r-programming-jhu-coursera-ad27086d8438#.ui3hb8n46
complete <- function(directory, id = 1:332) {
# Format number with fixed width and then append .csv to number
fileNames <- paste0(directory, '/', formatC(id, width=3, flag="0"), ".csv" )
# Reading in all files and making a large data.table
lst <- lapply(fileNames, data.table::fread)
dt <- rbindlist(lst)
return(dt[complete.cases(dt), .(nobs = .N), by = ID])
@mGalarnyk
mGalarnyk / corr.R
Last active January 2, 2017 21:26
corr.R This file is used for the John Hopkins Data Science Specialization (R Programming). This file is posted for the blog post reviewing the specialization https://medium.com/@GalarnykMichael/in-progress-review-course-2-r-programming-jhu-coursera-ad27086d8438#.38n89lga5
corr <- function(directory, threshold = 0) {
# Reading in all files and making a large data.table
lst <- lapply(file.path(directory, list.files(path = directory, pattern="*.csv")), data.table::fread)
dt <- rbindlist(lst)
# Only keep completely observed cases
dt <- dt[complete.cases(dt),]
# Apply threshold
@mGalarnyk
mGalarnyk / .bashrc
Last active February 1, 2019 16:36
Function to append to the end of .bashrc file in linux to run PySpark on jupyter notebook for the blog post https://medium.com/@GalarnykMichael/install-spark-on-ubuntu-pyspark-231c45677de0#.qxguj5czj
function snotebook ()
{
#Spark path (based on your computer)
SPARK_PATH=~/spark-2.0.0-bin-hadoop2.7
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
# For python 3 users, you have to add the line below or you will get an error
#export PYSPARK_PYTHON=python3
export SPARK_PATH=~/spark-1.6.0-bin-hadoop2.6
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
#For python 3, You have to add the line below or you will get an error
# export PYSPARK_PYTHON=python3
alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]'
@mGalarnyk
mGalarnyk / Python2.7_setup_aws_tensorflow.bash
Last active May 22, 2017 04:40
Installing TensorFlow on AWS GPU (TensorFlow 0.10, python 2.7). This script will also install CUDA toolkit 7.5 and CuDNN v5.
#!/bin/bash
# stop on error
set -e
# install the required packages
sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get -y install linux-headers-$(uname -r) linux-image-extra-`uname -r`
# install cuda 7.5
@mGalarnyk
mGalarnyk / Python3.5_setup_aws_tensorflow.bash
Last active November 11, 2018 03:00
Installing TensorFlow on AWS GPU (TensorFlow 0.10, python 3.5). Requires CUDA toolkit 7.5 and CuDNN v5
#!/bin/bash
# stop on error
set -e
# install the required packages
sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get -y install linux-headers-$(uname -r) linux-image-extra-`uname -r`
# install cuda
# http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
rm cuda-repo-ubuntu1404_7.5-18_amd64.deb
echo 'export CUDA_HOME=/usr/local/cuda
export CUDA_ROOT=/usr/local/cuda
export PATH=$PATH:$CUDA_ROOT/bin:$HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64
' >> ~/.bashrc
import numpy as np
# Solving following system of linear equation
# 1a + 1b = 35
# 2a + 4b = 94
a = np.array([[1, 1],[2,4]])
b = np.array([35, 94])
print(np.linalg.solve(a,b))
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Read in csv file
# File: https://github.com/mGalarnyk/Python_Tutorials/blob/master/Python_Basics/Linear_Regression/linear.csv
raw_data = pd.read_csv("linear.csv")
# Removes rows with NaN in them