Skip to content

Instantly share code, notes, and snippets.

@benmarwick
benmarwick / simple-table-markdown.r
Created December 4, 2013 23:19
How to make a simple table using R markdown that includes a caption.
```{r table-simple, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
require(pander)
panderOptions('table.split.table', Inf)
set.caption("My great data")
my.data <- "
Tables | Are | Cool
col 3 is | right-aligned | $1600
col 2 is | centered | $12
zebra stripe | are neat | $1"
df <- read.delim(textConnection(my.data),header=FALSE,sep="|",strip.white=TRUE,stringsAsFactors=FALSE)
@benmarwick
benmarwick / uw_archy_phd_theses.Rmd
Last active January 26, 2017 02:02
Basic analysis of data on UW Archaeology PhD theses of the last ten years
# Basic analysis of UW Archaeology PhD theses
There are no clear guidelines about the length or structure of a PhD thesis in
archaeology at UW. To answer these questions, we decided to make a quick study
of the norms evident in PhD theses produced in the last ten years.
## Methods
We made counts of the total number of pages for each theses and the number of pages per chapter. We also made a note of the year the thesis was passed to examine trends over time. We entered the data into a google sheet.
@benmarwick
benmarwick / oxcal_formatter.r
Created December 11, 2013 19:22
Uses R to format radiocarbon dates for calibration by OxCal
# read data in, three columns Name = lab code, Date = radiocarbon age, Uncertainty = error
dates <- read.csv('F:/My Documents/My UW/Research/1308 Sulawesi/Dates/TalimbueDatesForOxcal-2.csv', stringsAsFactors = FALSE)
# construct OxCal format
oxcal_format <- paste0('R_Date(\"', gsub("^\\s+|\\s+$", "", dates$Name), '\",', dates$Date, ',', dates$Uncertainty, ');')
# inspect
cat(oxcal_format)
# write formatted dates to text file
write.table(oxcal_format, file = 'oxcal_format.txt', row.names = FALSE, col.names = FALSE, quote = FALSE)
```{r}
# bring data into R
require(gdata) # must have Perl installed first: http://strawberryperl.com/
data <- read.xls("F:/My Documents/My UW/Teaching/Graduate Students/Amy Jordan/shell vs grit charts.xls", sheet = 'main', stringsAsFactors = FALSE)
```
```{r}
# check variables
unique(data$Level)
#Title: An example of the correlation of x and y for various distributions of (x,y) pairs
#Tags: Mathematics; Statistics; Correlation
#Author: Denis Boigelot
#Packets needed : mvtnorm (rmvnorm), #RSVGTipsDevice (devSVGTips)
#How to use: output()
#
#This is an translated version in R of an Matematica 6 code by Imagecreator.
# from http://en.wikipedia.org/wiki/File:Correlation_examples2.svg
library(mvtnorm)
@benmarwick
benmarwick / shakespeare_plays_genres.Rmd
Last active October 30, 2024 22:14
Quick and basic cluster analysis of Shakespeare's plays using R and full text from http://shakespeare.mit.edu/
Quick and dirtly look at Shakespeare's plays
====
Introduction
----
I was recently inpsired by the recent posts of Andrew Collier ([1](http://www.exegetic.biz/blog/2013/09/text-mining-the-complete-works-of-william-shakespeare/) and [2](http://www.exegetic.biz/blog/2013/09/clustering-the-words-of-william-shakespeare/)) and an earlier post by [Matt Jockers](http://www.matthewjockers.net/2009/02/13/machine-classifying-novels-and-plays-by-genre/) to take a recreational look at the plays of Shakespeare.
Motivated by Jockers, the specific topic I was interested in is the genres of the plays. For example, are the genres discrete or is there lots of overlap? Are the genres equal in variation or is one genre very focused and other very diverse? What are the key attributes that define the genres? And can I reproduce Jockers' use of high frequency words to identify genres? Related to Jockers' work on high frequency words is an earlier study by [Brainerd (1979)](http://www.jstor.org/stable/30207229) who used pronouns
@benmarwick
benmarwick / docs-per-topic.rmd
Last active August 29, 2015 13:56
How to find the topic with the highest proportion in a set of documents (after a topic model has been generated with the R package mallet)
Which documents belong to each topic?
Documents don't belong to a single topic, there is a distribution of topics
over each document.
But we can Find the topic with the highest proportion for each document.
That top-ranking topic might be called the 'topic' for the document, but note
that all docs have all topics to varying proportions
Assume that we start with `topic_docs` from the output of the mallet package
@benmarwick
benmarwick / gist:9204077
Last active August 29, 2015 13:56
RCloud - https://github.com/att/rcloud - setup on ubuntu
## Shell:
git clone --recursive https://github.com/cscheid/rcloud.git
sudo apt-get install libxt-dev libcurl4-openssl-dev libcairo2-dev libreadline-dev git
Create github app according to instructions here: https://github.com/att/rcloud
Edit conf/rcloud.conf according to instructions here: https://github.com/att/rcloud
@benmarwick
benmarwick / test.R
Last active March 23, 2022 02:29
Convert a folder of text files into a single CSV file with one column for the file names and one column of the text of the file. A function in R.
# test it by creating some small text files to run the function on
txt <- c("here is", "some text", "to test", "this function with", "'including a leading quote", '"and another leading quote')
# make text files
dir.create("testdir")
for(i in 1:length(txt)){
writeLines(txt[i], paste0("testdir/outfile-", i, ".txt"))
}
# run the function and then look in the CSV file that is produced.
@benmarwick
benmarwick / csv2txts.R
Last active February 9, 2021 15:06
Convert a single CSV file (one text per row) into separate text files. A function in R.
#' Making several text files from a single CSV file
#'
#' Convert a single CSV file (one text per row) into
#' separate text files. A function in R.
#'
#' To use this function for the first time run:
#' install.packages("devtools")
#' then thereafter you just need to load the function
#' fom github like so:
#' library(devtools) # windows users need Rtools installed, mac users need XCode installed