Skip to content

Instantly share code, notes, and snippets.

@CliffordAnderson
Last active December 31, 2015 19:29
Show Gist options
  • Save CliffordAnderson/8034262 to your computer and use it in GitHub Desktop.
Save CliffordAnderson/8034262 to your computer and use it in GitHub Desktop.
Produces word frequency list for speakers in Much Ado about Nothing.
xquery version "3.0";
(: Produces word lists with frequency per speaker :)
declare namespace tei="http://www.tei-c.org/ns/1.0";
let $doc := fn:doc("db/shakespeare/Ado.xml")
for $character in fn:distinct-values($doc//tei:sp/tei:speaker/tei:w/text())
let $bag :=
for $speaker in $doc//tei:sp
let $words := $speaker/tei:ab//tei:w/text()
where $speaker/tei:speaker/tei:w/text() = $character
return $words
let $word-list :=
for $word in fn:distinct-values($bag)
let $count := fn:count($bag[. = $word])
let $weighted := $count div fn:count($bag)
order by $count descending
return <word speaker="{$character}" type="{$word}" count="{$count}" weighted="{$weighted}"/>
return $word-list
@CliffordAnderson
Copy link
Author

<word speaker="LEONATO" type="I" count="72" weighted="0.027077848815344114"/>
<word speaker="LEONATO" type="you" count="69" weighted="0.025949605114704776"/>
<word speaker="LEONATO" type="of" count="62" weighted="0.023317036479879654"/>
<word speaker="LEONATO" type="my" count="51" weighted="0.019180142910868748"/>
<word speaker="LEONATO" type="and" count="51" weighted="0.019180142910868748"/>
<word speaker="LEONATO" type="her" count="51" weighted="0.019180142910868748"/>
<word speaker="LEONATO" type="to" count="47" weighted="0.017675817976682964"/>
<word speaker="LEONATO" type="a" count="43" weighted="0.016171493042497179"/>
<word speaker="LEONATO" type="the" count="40" weighted="0.015043249341857841"/>
<word speaker="LEONATO" type="it" count="35" weighted="0.013162843174125611"/>
<word speaker="LEONATO" type="that" count="33" weighted="0.012410680707032719"/>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment