Skip to content

Instantly share code, notes, and snippets.

@emchateau
Created March 1, 2023 12:42
Show Gist options
  • Save emchateau/ff7cf1279e15f10b03afff2d28ce539c to your computer and use it in GitHub Desktop.
Save emchateau/ff7cf1279e15f10b03afff2d28ce539c to your computer and use it in GitHub Desktop.
getWordsCount
(:~
: If performance is a concern, this problem is much better suited to leverage a word index with frequency data, like in an XML database. Solving this in pure XQuery may be considerably slower for large XML but solves the problem
: https://stackoverflow.com/questions/15122641/count-number-of-word-occurrences-in-strings-using-xquery
:)
let $xml :=
<root>
<nodeOne>
<nodeTwo>
<nodeThree>
foo bar zoo
</nodeThree>
</nodeTwo>
</nodeOne>
<nodeOne>
<nodeTwo>
<nodeThree>
foo bar
</nodeThree>
</nodeTwo>
</nodeOne>
<nodeOne>
<nodeTwo>
<nodeThree>
zoo bar
</nodeThree>
</nodeTwo>
</nodeOne>
</root>
let $toks := $xml//text()/fn:tokenize(fn:normalize-space(.),'\s')
for $t in distinct-values($toks)
let $count := count($toks[. = $t])
return element { $t } {
attribute count { $count }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment