An introduction to using graphs to model and analyze data in the humanities
A workshop at Bearea College on Wednesday, October 30th sponsored by The Mellon Partners for Humanities Education
An introduction to using graphs to model and analyze data in the humanities
A workshop at Bearea College on Wednesday, October 30th sponsored by The Mellon Partners for Humanities Education
for $doc in fn:collection("bpp-quarterly")//FullText[.//text() contains text {"jury", "law"} any using stemming] | |
let $hits := ft:extract($doc[.//text() contains text {"jury", "law"} all using stemming]) | |
let $count := fn:count($hits//mark) | |
let $record := fn:doc(fn:base-uri($doc)) | |
order by $count descending | |
return <hit count="{$count}" url="{$record//URLDocView}" title="{$record//RecordTitle}">{$hits}</hit> |
let $data := <column name="Reporters">Williams, Brian|Maceda, Jim|Keith, Brian Williams,|Thompson, Anne</column> | |
let $contributor := $data[@name='Reporters']/text() => fn:tokenize("\|") | |
return $contributor |
Today, we’ll be exploring patterns in a corpus of genuine and fake news collected in 2016 by Buzz Feed and scored for veracity by professional journalists. As you might imagine, the corpus contains very partisan perspectives; individual articles may contain disturbing language and viewpoints. In the initial code example below, you will need to have downloaded the data set and have created a database called articles
.
We’ll begin our investigation of natural language processing by using Aylien, which bill itself as a “News Intelligence Platform,” to classify these articles, analyze their topics, identify the people, places, things they discuss, and to discern the sentiment or tone of the articles. If you would like to follow along, please sign up for a free API key.
let $appid := "###" | |
let $key := "###" | |
let $endpoint := "https://api.aylien.com/api/v1/" | |
let $service := "hashtags" | |
let $text := fn:encode-for-uri("Are you serious? Do you think anyone cares about that crazy plan? Get back to Arizona!") | |
let $request := | |
<http:request method="get" href="{$endpoint || $service || '?text=' || $text}"> | |
<http:header name="Accept" value="text/xml"/> | |
<http:header name="X-AYLIEN-TextAPI-Application-Key" value="{$key}"/> | |
<http:header name="X-AYLIEN-TextAPI-Application-ID" value="{$appid}"/> |
xquery version "3.1"; | |
declare function local:seriesTitle($Network as xs:string?) as xs:string | |
{ | |
switch ($Network) | |
case "ABC" return "ABC World News Tonight" | |
case "CBS" return "CBS Evening News" | |
case "NBC" return "NBC Nightly News" | |
default return "unknown network" | |
}; |
In this session, we will be exploring the XQuery and XPath Full Text 1.0 standard. Our goal is to take the records that we created during our prior class from the Victorian Women Writers Project and persist them to another database where we will analyze their contents for textual patterns.
The following exercises assume that you have loaded the documents from the Victorian Women Writers Project into a BaseX database. It is also assumed that you have named that database vwwp_tei
.
In this session, we will extract poems from the Victorian Women Writers Project. The electronic editions of these documents are maintained in TEI P5 format on Github. You can also download a zip file of the entire corpus.
The following exercises assume that you have loaded the documents from the Victorian Women Writers Project into a BaseX database. It is also assumed that you have named that database vwwp_tei
.
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization"; | |
declare option output:method "csv"; | |
declare option output:csv "header=yes, separator=comma"; | |
<csv> | |
{ | |
for $doc in fn:collection("bpp-quarterly") | |
let $fullText := $doc/Record/FullText/text() | |
let $title := $doc/Record/Publication/Title/text() | |
let $articleTitle := $doc/Record/RecordTitle/text() |