Skip to content

Instantly share code, notes, and snippets.

View mGalarnyk's full-sized avatar

Michael Galarnyk mGalarnyk

View GitHub Profile
@mGalarnyk
mGalarnyk / project1.md
Last active March 9, 2023 02:16
Exploratory Data Analysis Project 1 (Week 1) John Hopkins Data Science Specialization for the github repo https://github.com/mGalarnyk/datasciencecoursera/tree/master/4_Exploratory_Data_Analysis

Exploratory Data Analysis Project 1

This assignment uses data from the UC Irvine Machine Learning Repository, a popular repository for machine learning datasets. In particular, we will be using the “Individual household electric power consumption Data Set” which I have made available on the course web site:

Dataset: Electric power consumption [20Mb]
Description: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.

library("data.table")
@mGalarnyk
mGalarnyk / quiz2.md
Created March 5, 2017 23:10
Exploratory Data Analysis Quiz 2 (Week 2) for the John Hopkins Data Science Specialization github repo https://github.com/mGalarnyk/datasciencecoursera/tree/master/4_Exploratory_Data_Analysis

Exploratory Data Analysis Quiz 2 (JHU) Coursera

Question 1

Under the lattice graphics system, what do the primary plotting functions like xyplot() and bwplot() return?

  • nothing; only a plot is made

  • an object of class "lattice"

@mGalarnyk
mGalarnyk / quiz1.md
Created March 5, 2017 22:55
Exploratory Data Analysis Quiz 1 (Week 1) JHU Coursera for github repo https://github.com/mGalarnyk/datasciencecoursera

Exploratory Data Analysis Quiz 1 (JHU) Coursera

Question 1

Which of the following is a principle of analytic graphics?

  • Make judicious use of color in your scatterplots (NO)

  • Don't plot more than two variables at at time (NO)

@mGalarnyk
mGalarnyk / helloWorld.html
Created February 25, 2017 19:28
Hello World HTML for my blog post https://medium.com/@GalarnykMichael
<html>
<header><title>This is title</title></header>
<body>
Hello world
</body>
</html>
# Getting and Cleaning Data Project John Hopkins Coursera
# Author: Michael Galarnyk
# 1. Merges the training and the test sets to create one data set.
# 2. Extracts only the measurements on the mean and standard deviation for each measurement.
# 3. Uses descriptive activity names to name the activities in the data set
# 4. Appropriately labels the data set with descriptive variable names.
# 5. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
# Load Packages and get the Data
# Getting and Cleaning Data, JHU Coursera
#1.
#The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
# https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv
# and load the data into R. The code book, describing the variable names is here:
# https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf
# Getting and Cleaning Data, Quiz 3 JHU Coursera
# 1. Register an application with the Github API here https://github.com/settings/applications.
#The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
#https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv
#and load the data into R. The code book, describing the variable names is here:
#https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf
# 1. Register an application with the Github API here https://github.com/settings/applications.
# Access the API to get information on your instructors repositories (hint: this is the url you want "https://api.github.com/users/jtleek/repos").
# Use this data to find the time that the datasharing repo was created. What time was it created?
# see https://medium.com/@GalarnykMichael/accessing-data-from-github-api-using-r-3633fb62cb08#.z0z07ph5h for more details.
#install.packages("jsonlite")
library(jsonlite)
#install.packages("httpuv")
library(httpuv)
#install.packages("httr")
# Quiz data.table code Week 1
# 1.
# fread url requires curl package on mac
# install.packages("curl")
# Reading in data
housing <- data.table::fread("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv")
# VAL attribute says how much property is worth, .N is the number of rows
housing[VAL == 24, .N]
@mGalarnyk
mGalarnyk / LinearRegression.R
Last active December 5, 2020 13:40
Univariate Linear Regression (relationship between the dependent variable y and the independent variable x is linear) using R programming Language for the blog post https://medium.com/@GalarnykMichael/univariate-linear-regression-using-r-programming-3db499bdd1e3#.kcm3t9rl3
# Linear Regression predicts linear relationship between two variables
# Set path to Desktop
setwd("~/Desktop")
download.file(url = 'https://raw.githubusercontent.com/mGalarnyk/Python_Tutorials/master/Python_Basics/Linear_Regression/linear.csv'
, destfile = 'linear.csv')
rawData=read.csv("linear.csv", header=T)
# Show first n entries of data.frame, notice NA values