agoldst

Empowerment Part II

The actual "empowerment" (modest but real) comes in getting a more detailed understanding of the way the systems we already use handle text, and in learning more ways to manipulate that text, beyond the confines of any single program. The business of plain-text-slinging, a minor craft on its own, nonetheless forms a natural starting point for thinking more deeply about analyzing digitized texts, expressing yourself in "code" of various kinds, and composing in the digital medium.

Downloads

In order to do the workshop on your own, first install Pandoc and LaTeX (links above). Komodo Edit is optional; any text editor will do, though I'll occasionally refer to details in Komodo (menu items, etc.) that may be slightly different in other editors. See below for text editor suggestions.

The handout from the workshop (PDF)

% DH@RU Workshop: Empowerment Part II % Andrew Goldstone ([email protected]) % November 20, 2013

Markdown

Text conventions

*emphasis* or _emphasis_; **strong emphasis**

	# the metadata.R script (for read.citations()) is part of
	# this git repository:
	# http://github.com/agoldst/dfr-analysis
	# So change this path as needed
	source("~/Developer/dfr-analysis/metadata.R")

	bennett.df <- read.citations("bennett.csv")
	woolf.df <- read.citations("woolf.csv")

	# Now bind the two together, using columns to flag AB and VW hits


	import Text.Pandoc

	{-
	This script uses the Pandoc library to do two transformations
	needed on the way from my mixed markdown/LaTeX syllabus sources to a
	single HTML file:

	1. Transform the slightly garbled html produced by tex4ht from LaTeX
	source containing a biblatex bibliography by getting rid of definition

	# for this file, clone http://github.com/agoldst/dfr-analysis
	source("~/Developer/dfr-analysis/metadata.R")
	library(plyr)
	library(stringr)

	wordcounts_v <- function (f) {
	frm <- scan(f,what=list(word=character(),weight=integer()),sep=",",skip=1,quiet=T)
	result <- frm$weight
	names(result) <- frm$word
	result

	library("httr")

	r_lits <- GET("http://api.nobelprize.org/v1/prize.json",query=list(category="literature"))

	laureates <- content(r_lits,"parsed")$prizes # JSON

	ids <- sapply(laureates,function (psn) {
	psn$laureates[[1]]$id
	})

	require 'jekyll'
	require 'pandoc-ruby' # add pandoc-ruby to your Gemfile

	# Plugin for using pandoc as Jekyll markdown processor
	# http://jekyllrb.com/docs/extras/ q.v.
	# install in jekyll _plugins/ folder
	# or Octopress plugins/

	# In _config.yml, specify
	# markdown: Pandoc # capital P

	As long promised, here are some links to the data I showed a table of during our discussion of Casanova about U.S. literary translation.
	By kind permission of Chad Post, I can make available an aggregate data file of all the literature translations catalogued by Three Percent. I've decided to put the data file, together with some scripts and information about the munging, in a [github repository](http://github.com/agoldst/threepercent). The data consists of a single CSV file with one line for each title: [all_titles.csv](https://github.com/agoldst/threepercent/blob/master/all_titles.csv) ([Wikipedia on CSV format](http://en.wikipedia.org/wiki/Comma-separated_values)).

	I have produced this by exporting the first "sheet" of each of the five yearly spreadsheets available at [the Three Percent Translation Database](http://www.rochester.edu/College/translation/threepercent/index.php?s=database) and then combining the files. According to Chad Post, updated data will be available soon, at which point I can reprodu



	opts_chunk$set(echo=F,warning=F,prompt=F,comment="",
	autodep=T,cache=T,dev="tikz",
	fig.width=4.5,fig.height=3,size ='footnotesize',
	dev.args=list(pointsize=12))
	options(width=70)
	options(tikzDefaultEngine="xetex")
	options(tikzXelatexPackages=c(
	"\\usepackage{tikz}\n",

	# mallet-inference.R
	#
	# functions for using MALLET's topic-inference functionality: given an
	# existing topic model, estimate topic proportions for new documents
	#
	# source() this file
	#
	# Workflow
	# --------
	#