Skip to content

Instantly share code, notes, and snippets.

View mGalarnyk's full-sized avatar

Michael Galarnyk mGalarnyk

View GitHub Profile
library(datasets)
library(data.table)
iris_dt <- as.data.table(iris)
# 1 There will be an object called 'iris' in your workspace.
# In this dataset, what is the mean of 'Sepal.Length' for the species virginica? Please round your answer to the nearest whole number.
# Basic data.table syntax below .
#iris_dt[ essentially SQL Where class, select statement, groupby]
iris_dt[Species == "virginica",round(mean(Sepal.Length)) ]
@mGalarnyk
mGalarnyk / Fibonacci_Sequence.py
Last active November 14, 2020 13:16
Fibonacci sequence algorithm in Python. 5 different ways for a later blog post at https://medium.com/@GalarnykMichael
# To incorporate and learn from later: http://stackoverflow.com/questions/494594/how-to-write-the-fibonacci-sequence-in-python
##########################################
# Method 1: Simple For Loops
# If you like, you can specify which Python version you are using
# Python 2 Version
# (xrange doesnt exist in Python3)
a, b = 0, 1
for i in xrange(0, 10):
# Question 1
# Plot the 30-day mortality rates for heart attack
# Read the outcome data into R via the read.csv function and look at the first few rows.
# outcome <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
# head(outcome)
# There are many columns in this dataset. You can see how many by typing ncol(outcome) (you can see the number of rows with the nrow function). In addition, you can see the names of each column by typing names(outcome) (the names are also in the PDF document.
# To make a simple histogram of the 30-day death rates from heart attack (column 11 in the outcome dataset), run
library(data.table)
@mGalarnyk
mGalarnyk / R_Github_Api.R
Last active September 7, 2021 14:55
Reading Data From GitHub API Using R. This code was originally for the John Hopkins Data Science Specialization. Blog on it https://medium.com/@GalarnykMichael/accessing-data-from-github-api-using-r-3633fb62cb08#.toufbbjgd
#install.packages("jsonlite")
library(jsonlite)
#install.packages("httpuv")
library(httpuv)
#install.packages("httr")
library(httr)
# Can be github, linkedin etc depending on application
oauth_endpoints("github")
@mGalarnyk
mGalarnyk / DataTwitterAPIusingR.R
Last active December 7, 2021 11:08
Accessing Data from Twitter API using R (part1) for the blog post https://medium.com/@GalarnykMichael
#install.packages("twitteR")
library(twitteR)
# Change the next four lines based on your own consumer_key, consume_secret, access_token, and access_secret.
consumer_key <- "OQMbUsBfWQ1mVUGASpSArbG33"
consumer_secret <- "GQ5kc0BlwJZE2FYyvv8cxn845z32ES6HsID87cawkQ075jwyIy"
access_token <- "4338966852-lBmLvEg9mADHIdjK2hT4W5mtHmI9jRKxcV4PTrB"
access_secret <- "AwKRZw9AvTMvMrb2jouX5JHTjDASI3zeceVsemgQa1SSq"
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
@mGalarnyk
mGalarnyk / LinearRegression.R
Last active December 5, 2020 13:40
Univariate Linear Regression (relationship between the dependent variable y and the independent variable x is linear) using R programming Language for the blog post https://medium.com/@GalarnykMichael/univariate-linear-regression-using-r-programming-3db499bdd1e3#.kcm3t9rl3
# Linear Regression predicts linear relationship between two variables
# Set path to Desktop
setwd("~/Desktop")
download.file(url = 'https://raw.githubusercontent.com/mGalarnyk/Python_Tutorials/master/Python_Basics/Linear_Regression/linear.csv'
, destfile = 'linear.csv')
rawData=read.csv("linear.csv", header=T)
# Show first n entries of data.frame, notice NA values
# Quiz data.table code Week 1
# 1.
# fread url requires curl package on mac
# install.packages("curl")
# Reading in data
housing <- data.table::fread("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv")
# VAL attribute says how much property is worth, .N is the number of rows
housing[VAL == 24, .N]
# 1. Register an application with the Github API here https://github.com/settings/applications.
# Access the API to get information on your instructors repositories (hint: this is the url you want "https://api.github.com/users/jtleek/repos").
# Use this data to find the time that the datasharing repo was created. What time was it created?
# see https://medium.com/@GalarnykMichael/accessing-data-from-github-api-using-r-3633fb62cb08#.z0z07ph5h for more details.
#install.packages("jsonlite")
library(jsonlite)
#install.packages("httpuv")
library(httpuv)
#install.packages("httr")
# Getting and Cleaning Data, Quiz 3 JHU Coursera
# 1. Register an application with the Github API here https://github.com/settings/applications.
#The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
#https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv
#and load the data into R. The code book, describing the variable names is here:
#https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf
# Getting and Cleaning Data, JHU Coursera
#1.
#The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
# https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv
# and load the data into R. The code book, describing the variable names is here:
# https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf