Skip to content

Instantly share code, notes, and snippets.

# how often are words (word-stems) used across all the docs
dtm <- DocumentTermMatrix(corpus.stemmed)
# in the first 5 text files, how frequent are the first 8 words (alphabetical order)
inspect(dtm[1:5, 1:8])
# let's make that dtm table a matrix...
dtm.mat <- as.matrix(dtm)
####### STEP 3 ----- visualizing the high-frequency words
@diamonaj
diamonaj / step1.R
Last active January 30, 2023 18:43
### In your R working directory, you should have a directory called "federalist" filled with .txt files
corpus.raw <- Corpus(DirSource(directory = "federalist", pattern = "fp"))
# this corpus comes with many different text files built in
# to see text, use "content()" and specify which doc (e.g., the 1st one)
content(corpus.raw[[1]])
####### GET THE DATA IN SHAPE
# make lower case
### This exercise requires installing a bunch of packages---
### Unfortunately, the precise sequence and rules for installing may vary
### depending upon your computer and configuration.
## ***Taken from Chapter 5 in Kosuke Imai's "Quantitative Social Science"
## Transcribed by Alexis Diamond, all errors my own...
##########################################################################
@diamonaj
diamonaj / gist:7cfada702e17278c032f023734f8c336
Created January 16, 2023 01:33
Run this in R on your local computer before our tutorial (it will take 2-3 mins)
install.packages("devtools")
library(devtools)
devtools::install_github("kosukeimai/qss-package", build_vignettes = TRUE)
library(qss)
install.packages("tm")
install.packages("SnowballC")
library(tm)
#############
# Answer key to pre-class work for Lesson 1
#***Step 1:
#Load dataset
#Note: This step may take several seconds to complete
# click the link for the data set and see that it's a .csv file
# so use "read.csv" -- also, don't name this data object "data"
---
title: "Assignment 3"
output: pdf_document
date: '2022-11-30'
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
set.seed(1)
```
@diamonaj
diamonaj / PCW_18_CS312.R
Created November 30, 2022 19:23
Pre-Class Work coding solution for session 18 (quantile effects)
## Code Cell 1 of 4
# Show median treatment effect estimation here
#install.packages("quantreg")
library(quantreg)
library(Matching)
data(lalonde)
rqfit_r50 <- rq(re78 ~ treat, tau = 0.5, data = lalonde)
summary(rqfit_r50)
# From class on Monday, Nov 14
# Switching to the OBSERVATIONAL Lalonde data set
# do genetic matching using the observational lalonde data
# use a caliper = to 0.1 standard deviations for every x.
library(Matching)
lalonde <- read.csv("https://tinyurl.com/st9n3dl")
X = cbind(lalonde$age, lalonde$educ, lalonde$black, lalonde$hisp, lalonde$married, lalonde$nodegr, lalonde$u74, lalonde$u75, lalonde$re75, lalonde$re74)
@diamonaj
diamonaj / to be shared with you.R
Created October 25, 2022 01:18
example for my amazing CS130 students
# This is my code I want to share with you! :)
die_rolls <- 1:6
storage <- c()
for(i in 1:1000)
{
rolls_for_the_week <- sample(die_rolls, 7, replace = TRUE)
storage[i] <- sum(rolls_for_the_week == 1) == 0 # if there's no 1, then TRUE
}
cat("\nthe probability of not rolling a 1 all week is estimated =",