agoldst

% DH@RU Workshop: Empowerment Part II % Andrew Goldstone ([email protected]) % November 20, 2013

Markdown

Text conventions

*emphasis* or _emphasis_; **strong emphasis**

Empowerment Part II

The actual "empowerment" (modest but real) comes in getting a more detailed understanding of the way the systems we already use handle text, and in learning more ways to manipulate that text, beyond the confines of any single program. The business of plain-text-slinging, a minor craft on its own, nonetheless forms a natural starting point for thinking more deeply about analyzing digitized texts, expressing yourself in "code" of various kinds, and composing in the digital medium.

Downloads

In order to do the workshop on your own, first install Pandoc and LaTeX (links above). Komodo Edit is optional; any text editor will do, though I'll occasionally refer to details in Komodo (menu items, etc.) that may be slightly different in other editors. See below for text editor suggestions.

The handout from the workshop (PDF)

	---
	title: "Nobel genre tallies"
	output:
	html_document:
	self_contained: false
	...

	```{r setup, include=F}
	library(tidyverse)
	library(rvest)

	#!/bin/bash
	#
	# contacts_query.sh
	# Andrew Goldstone, July 2017. All yours to use or modify, but no promises.
	#
	# The mutt e-mail client has an option to query an external address book for
	# e-mail addresses. On a Mac it is nice to be able to query the Address Book
	# (now known as Contacts). For a while I used a utility called contacts
	# (http://gnufoo.org/contacts) but this stopped working under Sierra. There is
	# an official API for querying Contacts as a unified datastore, but it is only

	# Please see the accompanying enex.md file for usage notes.

	library(xml2)
	library(stringr)

	enex_tagspace <- function (enex, d, dry=T) {
	node_title <- . %>% xml_find_all(".//title") %>% xml_text()

	# Tagspaces delimits tags by spaces, so we have to eliminate spaces from
	# tag names.

	# mallet-inference.R
	#
	# functions for using MALLET's topic-inference functionality: given an
	# existing topic model, estimate topic proportions for new documents
	#
	# source() this file
	#
	# Workflow
	# --------
	#



	opts_chunk$set(echo=F,warning=F,prompt=F,comment="",
	autodep=T,cache=T,dev="tikz",
	fig.width=4.5,fig.height=3,size ='footnotesize',
	dev.args=list(pointsize=12))
	options(width=70)
	options(tikzDefaultEngine="xetex")
	options(tikzXelatexPackages=c(
	"\\usepackage{tikz}\n",

	require 'jekyll'
	require 'pandoc-ruby' # add pandoc-ruby to your Gemfile

	# Plugin for using pandoc as Jekyll markdown processor
	# http://jekyllrb.com/docs/extras/ q.v.
	# install in jekyll _plugins/ folder
	# or Octopress plugins/

	# In _config.yml, specify
	# markdown: Pandoc # capital P

	library("httr")

	r_lits <- GET("http://api.nobelprize.org/v1/prize.json",query=list(category="literature"))

	laureates <- content(r_lits,"parsed")$prizes # JSON

	ids <- sapply(laureates,function (psn) {
	psn$laureates[[1]]$id
	})

	# for this file, clone http://github.com/agoldst/dfr-analysis
	source("~/Developer/dfr-analysis/metadata.R")
	library(plyr)
	library(stringr)

	wordcounts_v <- function (f) {
	frm <- scan(f,what=list(word=character(),weight=integer()),sep=",",skip=1,quiet=T)
	result <- frm$weight
	names(result) <- frm$word
	result