This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(jsonlite) | |
cp = fromJSON(txt = "Cell Phone Data.txt", simplifyDataFrame = TRUE) | |
num.atts = c(4,9,11,12,13,14,15,16,18,22) | |
cp[,num.atts] = sapply(cp[,num.atts], function (x) as.numeric(x)) | |
cp$aspect.ratio = cp$att_pixels_y / cp$att_pixels_x | |
cp$isSmartPhone = ifelse(grepl("smart|iphone|blackberry", cp$name, ignore.case=TRUE) == TRUE | cp$att_screen_size >= 4, "Yes", "No") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(plyr) | |
library(ggplot2) | |
library(ggmap) | |
libraries = read.csv("ontario_library_stats_2010.csv") | |
libraries$isFN = ifelse(libraries$Library.Service.Type == "First Nations Library",1,0) | |
# Here we create the 'proportionate' versions of all the variables | |
libraries[,143:265] = sapply(libraries[,20:142], function (x) x/libraries[,13]) | |
names(libraries)[143:265] = paste(names(libraries)[20:142], "P",sep=".") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
rfiles = os.listdir('.') | |
rc = [] | |
for f in rfiles: | |
if '.txt' in f: | |
# The recipes come in 3 txt files consisting of 1 recipe per line, the | |
# cuisine of the recipe as the first entry in the line, and all subsequent ingredient | |
# entries separated by a tab | |
infile = open(f, 'r') | |
rc.append(infile.read()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
recipes = readLines('recipes combined.tsv') | |
# Once I read it into R, I have to get rid of the /t | |
# characters so that it's more acceptable to the tm package | |
recipes.new = apply(as.matrix(recipes), 1, function (x) gsub('\t',' ', x)) | |
recipes.corpus = Corpus(VectorSource(recipes.new)) | |
recipes.dtm = DocumentTermMatrix(recipes.corpus) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ****Introduction**** | |
# Data analysis is like an interview. In any interview, the interviewer hopes to use a series of | |
# questions in order to discover a story. The questions the interviewer asks, of course, are | |
# subjectively chosen. As such, the story that one interviewer gets out of an interviewee might | |
# be fairly different from the story that another interviewer gets out of the same person. In the | |
# same way, the commands (and thus the analysis) below are not the only way of analyzing the data. | |
# When you understand what the commands are doing, you might decide to take a different approach | |
# to analyzing the data. Please do so, and be sure to share what you find! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ltep = read.csv("ltep-survey-results-all.csv") | |
library(likert) | |
library(ggthemes) | |
# Here I flip the scoring | |
ltep[,13:19] = sapply(ltep[,13:19], function (x) 8 - x) | |
deal.w.esources = likert(ltep[,13:19]) | |
summary(deal.w.esources) | |
plot(deal.w.esources, text.size=6, text.color="black") + theme(axis.text.x=element_text(colour="black", face="bold", size=14), axis.text.y=element_text(colour="black", face="bold", size=14), axis.title.x=element_text(colour="black", face="bold", size=14), plot.title=element_text(size=18, face="bold")) + ggtitle("What guidelines should Ontario use\n for its future mix of energy sources?") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
docs = [] | |
from os import listdir, chdir | |
import re | |
# Here's the section where I try to filter useless stuff out. | |
# Notice near the end all of the regex patterns where I've called | |
# "re.DOTALL". This is pretty key here. What it means is that the | |
# .+ I have referenced within the regex pattern should be able to | |
# pick up alphanumeric characters, in addition to newline characters |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(stringr) | |
library(plyr) | |
library(tm) | |
library(tm.plugin.mail) | |
library(SnowballC) | |
library(topicmodels) | |
# At this point, the python script should have been run, | |
# creating about 126 thousand txt files. I was very much afraid | |
# to import that many txt files into the tm package in R (my computer only |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
docs = [] | |
from os import listdir, chdir | |
import re | |
# Here's my attempt at coming up with regular expressions to filter out | |
# parts of the enron emails that I deem as useless. | |
email_pat = re.compile(".+@.+") | |
to_pat = re.compile("To:.+\n") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(ff) | |
library(ffbase) | |
library(RgoogleMaps) | |
library(plyr) | |
addTrans <- function(color,trans) | |
{ | |
# This function adds transparancy to a color. | |
# Define transparancy with an integer between 0 and 255 | |
# 0 being fully transparant and 255 being fully visable |
NewerOlder