cimentadaj’s gists

cimentadaj / traffic_data.R

Last active November 18, 2017 17:45

cimentadaj / markdown_text_analysis.Rnw

Last active July 11, 2017 06:27

	---
	title: "Scraping and visualizing How I Met Your Mother"
	author: "Jorge Cimentada"
	date: "7/10/2017"
	output: html_document
	---

	How I Met Your Mother (HIMYM from here after) is a television series very similar to the classical 'Friends' series from the 90's. Following the release of the 'tidy text' book I was looking for a project in which I could apply some of these skills. I decided I would scrape all the transcripts from HIMYM and analyze patterns between characters. This post really took me to the limit in terms of web scraping and pattern matching, which was specifically what I wanted to improve in the first place. Let's begin!

	My first task was whether there was any consistency in the URL's that stored the transcripts. If you ever watched HIMYM, we know there's around nine seasons, each one with about 22 episodes. This makes about 200 episodes give or take. It would be a big pain in the ass to manually write down 200 complicated URL's. Luckily, there is a way of finding the 200 links without writing

cimentadaj / text_analysis.R

Last active September 27, 2017 08:43

	---
	title: "Scraping and visualizing How I Met Your Mother"
	author: "Jorge Cimentada"
	date: "7/10/2017"
	output: html_document
	---

	How I Met Your Mother (HIMYM from here after) is a television series very similar to the classical 'Friends' series from the 90's. Following the release of the 'tidy text' book I was looking for a project in which I could apply some of these skills. I decided I would scrape all the transcripts from HIMYM and analyze patterns between characters. This post really took me to the limit in terms of web scraping and pattern matching, which was specifically what I wanted to improve in the first place. Let's begin!

	My first task was whether there was any consistency in the URL's that stored the transcripts. If you ever watched HIMYM, we know there's around nine seasons, each one with about 22 episodes. This makes about 200 episodes give or take. It would be a big pain in the ass to manually write down 200 complicated URL's. Luckily, there is a way of finding the 200 links without writing

cimentadaj / plot_shrink.R

Last active July 15, 2017 13:54

	# Functions

	# Probably it is possible to combine these two functions in one, as they are identical only that
	# in the first with the "if" we manipulate x and with "else" y, while the opposite is the
	# case for the second function.

	# First polygon
	shrink_fun <- function(x, shrink, x_value = TRUE) {

	if(x_value) {

cimentadaj / mexico_mortality.csv

Created June 28, 2017 09:57

We can't make this file beautiful and searchable because it's too large.

	"State","StateName","Year","Sex","Age","Cause","CauseName","Gap"
	"1","Aguascalientes","1990","m",15,"g1","Amenable to medical service",0.000675098052688838
	"1","Aguascalientes","1990","m",16,"g1","Amenable to medical service",0.000667806966070827
	"1","Aguascalientes","1990","m",17,"g1","Amenable to medical service",0.00100441688809383
	"1","Aguascalientes","1990","m",18,"g1","Amenable to medical service",0.00136100033554243
	"1","Aguascalientes","1990","m",19,"g1","Amenable to medical service",0.00149372003684789
	"1","Aguascalientes","1990","m",20,"g1","Amenable to medical service",0.0013343531602672
	"1","Aguascalientes","1990","m",21,"g1","Amenable to medical service",0.000926874268287747
	"1","Aguascalientes","1990","m",22,"g1","Amenable to medical service",0.000352160113791911
	"1","Aguascalientes","1990","m",23,"g1","Amenable to medical service",0

cimentadaj / mexico_mortality

Last active June 28, 2017 10:00

	library(tidyverse)
	library(animation)

	url <- "https://gist.githubusercontent.com/cimentadaj/a2226ca503031140caecb7add0670d81/raw/7f09b9f457e67f13acda2305b9ae391d277070a4/mexico_mortality.csv"
	data <- read_csv(url)

	other_new_data <-
	data %>%
	mutate(cause_recode = dplyr::recode(CauseName,
	'Road traffic' = 'Road traffic + Suicide',

cimentadaj / arm_ex_8.6.1.3.R

Created November 10, 2016 17:14

	# (c) Repeat this simulation, but instead fit the model using t errors (see Exercise 6.6).
	# The only change here is defining error1 as a t distribution instead of normally distributed
	coefs <- array(NA, c(3, 1000))
	se <- array(NA, c(3, 1000))


	for (i in 1:ncol(coefs)) {
	x1 <- 1:100
	x2 <- rbinom(100, 1, 0.5)
	error1 <- rt(100, df=4)sqrt(5 (4-2)/4) + 0 # t distributed errors

cimentadaj / arm_ex_8.6.1.2.R

Created November 10, 2016 17:09

	# (b) Put the above step in a loop and repeat 1000 times. Calculate the
	# confidence coverage for the 68% intervals for each of the three
	# coefficients in the model.
	coefs <- array(NA, c(3, 1000))
	se <- array(NA, c(3, 1000))

	# Naturally, these estimates will be different for anyone who runs this code
	for (i in 1:ncol(coefs)) {
	x1 <- 1:100
	x2 <- rbinom(100, 1, 0.5)

cimentadaj / arm_ex_8.6.1.1.R

Created November 10, 2016 17:06

	# (a) Simulate data from this model. For simplicity, suppose the values of x1 are simply the integers
	# from 1 to 100, and that the values of x2 are random and equally likely to be 0 or 1. Fit a linear
	# regression (with normal errors) to these data and see if the 68% confidence intervals for the
	# regression coefficients (for each, the estimates ±1 standard error) cover the true values.

	library(arm)
	library(broom)
	library(hett)

	set.seed(2131)

cimentadaj / arm_ex_12.4.R

Created November 6, 2016 15:25

	# Posterior predictive checking: continuing the previous exercise, use the fitted
	# model from Exercise 12.2(b) to simulate a new dataset of CD4 percentages
	# (with the same sample size and ages of the original dataset) for the final time
	# point of the study, and record the average CD4 percentage in this sample.
	# Repeat this process 1000 times and compare the simulated distribution to the
	# observed CD4 percentage at the final time point for the actual data.

	# Make the data similar to the model in mod2
	finaltime_data <- subset(cd4, !is.na(treatmnt) & !is.na(baseage))

Jorge Cimentada cimentadaj