mGalarnyk’s gists

mGalarnyk / pollutantmean.R

Last active January 2, 2017 16:11

pollutantmean.R This file is used for the John Hopkins Data Science Specialization (R Programming). This file is posted for the blog post reviewing the specialization https://medium.com/@GalarnykMichael/in-progress-review-course-2-r-programming-jhu-coursera-ad27086d8438#.ui3hb8n46

	pollutantmean <- function(directory, pollutant, id = 1:332) {

	# Format number with fixed width and then append .csv to number
	fileNames <- paste0(directory, '/', formatC(id, width=3, flag="0"), ".csv" )

	# Reading in all files and making a large data.table
	lst <- lapply(fileNames, data.table::fread)
	dt <- rbindlist(lst)

	if (c(pollutant) %in% names(dt)){

mGalarnyk / complete.R

Last active January 2, 2017 16:25

complete.R This file is used for the John Hopkins Data Science Specialization (R Programming). This file is posted for the blog post reviewing the specialization https://medium.com/@GalarnykMichael/in-progress-review-course-2-r-programming-jhu-coursera-ad27086d8438#.ui3hb8n46

	complete <- function(directory, id = 1:332) {

	# Format number with fixed width and then append .csv to number
	fileNames <- paste0(directory, '/', formatC(id, width=3, flag="0"), ".csv" )

	# Reading in all files and making a large data.table
	lst <- lapply(fileNames, data.table::fread)
	dt <- rbindlist(lst)

	return(dt[complete.cases(dt), .(nobs = .N), by = ID])

mGalarnyk / corr.R

Last active January 2, 2017 21:26

corr.R This file is used for the John Hopkins Data Science Specialization (R Programming). This file is posted for the blog post reviewing the specialization https://medium.com/@GalarnykMichael/in-progress-review-course-2-r-programming-jhu-coursera-ad27086d8438#.38n89lga5

	corr <- function(directory, threshold = 0) {

	# Reading in all files and making a large data.table
	lst <- lapply(file.path(directory, list.files(path = directory, pattern="*.csv")), data.table::fread)
	dt <- rbindlist(lst)

	# Only keep completely observed cases
	dt <- dt[complete.cases(dt),]

	# Apply threshold

mGalarnyk / .bashrc

Last active February 1, 2019 16:36

Function to append to the end of .bashrc file in linux to run PySpark on jupyter notebook for the blog post https://medium.com/@GalarnykMichael/install-spark-on-ubuntu-pyspark-231c45677de0#.qxguj5czj

	function snotebook ()
	{
	#Spark path (based on your computer)
	SPARK_PATH=~/spark-2.0.0-bin-hadoop2.7

	export PYSPARK_DRIVER_PYTHON="jupyter"
	export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

	# For python 3 users, you have to add the line below or you will get an error
	#export PYSPARK_PYTHON=python3

mGalarnyk / .bash_profile

Last active January 3, 2017 06:21

Setting the path for spark for the blog tutorial https://medium.com/@GalarnykMichael/install-spark-on-mac-pyspark-453f395f240b#.6md1dipy3

	export SPARK_PATH=~/spark-1.6.0-bin-hadoop2.6
	export PYSPARK_DRIVER_PYTHON="jupyter"
	export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

	#For python 3, You have to add the line below or you will get an error
	# export PYSPARK_PYTHON=python3
	alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]'

mGalarnyk / Python2.7_setup_aws_tensorflow.bash

Last active May 22, 2017 04:40

Installing TensorFlow on AWS GPU (TensorFlow 0.10, python 2.7). This script will also install CUDA toolkit 7.5 and CuDNN v5.

	#!/bin/bash

	# stop on error
	set -e

	# install the required packages
	sudo apt-get update && sudo apt-get -y upgrade
	sudo apt-get -y install linux-headers-$(uname -r) linux-image-extra-`uname -r`

	# install cuda 7.5

mGalarnyk / Python3.5_setup_aws_tensorflow.bash

Last active November 11, 2018 03:00

Installing TensorFlow on AWS GPU (TensorFlow 0.10, python 3.5). Requires CUDA toolkit 7.5 and CuDNN v5

	#!/bin/bash

	# stop on error
	set -e

	# install the required packages
	sudo apt-get update && sudo apt-get -y upgrade
	sudo apt-get -y install linux-headers-$(uname -r) linux-image-extra-`uname -r`

	# install cuda

mGalarnyk / Install_Cuda_7.5_Ubuntu_14_04

Last active February 23, 2017 02:55

	# http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/
	wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
	sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
	rm cuda-repo-ubuntu1404_7.5-18_amd64.deb
	echo 'export CUDA_HOME=/usr/local/cuda
	export CUDA_ROOT=/usr/local/cuda
	export PATH=$PATH:$CUDA_ROOT/bin:$HOME/bin
	export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64
	' >> ~/.bashrc

mGalarnyk / linear_equation.py

Last active January 10, 2017 05:10

Solve system of equations using Python for the blog post https://medium.com/@GalarnykMichael/solving-system-of-linear-equations-using-python-645ad1904cec#.5y0emh8w6

	import numpy as np

	# Solving following system of linear equation
	# 1a + 1b = 35
	# 2a + 4b = 94

	a = np.array([[1, 1],[2,4]])
	b = np.array([35, 94])

	print(np.linalg.solve(a,b))

mGalarnyk / LinearRegression.py

Last active February 23, 2017 02:54

Linear Regression using Python for the blog post https://medium.com/@GalarnykMichael/linear-regression-using-python-b29174c3797a#.mxd9tjl4z

	import numpy as np
	import pandas as pd
	from sklearn.linear_model import LinearRegression
	import matplotlib.pyplot as plt

	# Read in csv file
	# File: https://github.com/mGalarnyk/Python_Tutorials/blob/master/Python_Basics/Linear_Regression/linear.csv
	raw_data = pd.read_csv("linear.csv")

	# Removes rows with NaN in them

Michael Galarnyk mGalarnyk