Clifford Anderson CliffordAnderson

Graph Analysis for the Humanities

An introduction to using graphs to model and analyze data in the humanities

Berea College Workshop

A workshop at Bearea College on Wednesday, October 30th sponsored by The Mellon Partners for Humanities Education

Introduction to Neo4j

Artistic Influence

Introduction

This graph studies the relations of infuence between artists. The data comes from this query of Wikidata:

Natural Language Processing

Today, we’ll be exploring patterns in a corpus of genuine and fake news collected in 2016 by Buzz Feed and scored for veracity by professional journalists. As you might imagine, the corpus contains very partisan perspectives; individual articles may contain disturbing language and viewpoints. In the initial code example below, you will need to have downloaded the data set and have created a database called articles.

We’ll begin our investigation of natural language processing by using Aylien, which bill itself as a “News Intelligence Platform,” to classify these articles, analyze their topics, identify the people, places, things they discuss, and to discern the sentiment or tone of the articles. If you would like to follow along, please sign up for a free API key.

XQuery Working Group

XQuery and XPath Full Text 1.0

In this session, we will be exploring the XQuery and XPath Full Text 1.0 standard. Our goal is to take the records that we created during our prior class from the Victorian Women Writers Project and persist them to another database where we will analyze their contents for textual patterns.

The following exercises assume that you have loaded the documents from the Victorian Women Writers Project into a BaseX database. It is also assumed that you have named that database vwwp_tei.

XQuery Working Group

Text Mining at Scale

In this session, we will extract poems from the Victorian Women Writers Project. The electronic editions of these documents are maintained in TEI P5 format on Github. You can also download a zip file of the entire corpus.

The following exercises assume that you have loaded the documents from the Victorian Women Writers Project into a BaseX database. It is also assumed that you have named that database vwwp_tei.

	for $doc in fn:collection("bpp-quarterly")//FullText[.//text() contains text {"jury", "law"} any using stemming]
	let $hits := ft:extract($doc[.//text() contains text {"jury", "law"} all using stemming])
	let $count := fn:count($hits//mark)
	let $record := fn:doc(fn:base-uri($doc))
	order by $count descending
	return <hit count="{$count}" url="{$record//URLDocView}" title="{$record//RecordTitle}">{$hits}</hit>

	let $appid := "###"
	let $key := "###"
	let $endpoint := "https://api.aylien.com/api/v1/"
	let $service := "hashtags"
	let $text := fn:encode-for-uri("Are you serious? Do you think anyone cares about that crazy plan? Get back to Arizona!")
	let $request :=
	<http:request method="get" href="{$endpoint \|\| $service \|\| '?text=' \|\| $text}">
	<http:header name="Accept" value="text/xml"/>
	<http:header name="X-AYLIEN-TextAPI-Application-Key" value="{$key}"/>
	<http:header name="X-AYLIEN-TextAPI-Application-ID" value="{$appid}"/>

	xquery version "3.1";

	declare function local:seriesTitle($Network as xs:string?) as xs:string
	{
	switch ($Network)
	case "ABC" return "ABC World News Tonight"
	case "CBS" return "CBS Evening News"
	case "NBC" return "NBC Nightly News"
	default return "unknown network"
	};

	declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
	declare option output:method "csv";
	declare option output:csv "header=yes, separator=comma";

	<csv>
	{
	for $doc in fn:collection("bpp-quarterly")
	let $fullText := $doc/Record/FullText/text()
	let $title := $doc/Record/Publication/Title/text()
	let $articleTitle := $doc/Record/RecordTitle/text()

	let $data := <column name="Reporters">Williams, Brian\|Maceda, Jim\|Keith, Brian Williams,\|Thompson, Anne</column>
	let $contributor := $data[@name='Reporters']/text() => fn:tokenize("\\|")
	return $contributor