Skip to content

Instantly share code, notes, and snippets.

@joyoyoyoyoyo
Forked from pgwillia/frequency_sort.sh
Created November 29, 2017 14:15
Show Gist options
  • Save joyoyoyoyoyo/c24baf1ba7df0406c3c5252bc11ed512 to your computer and use it in GitHub Desktop.
Save joyoyoyoyoyo/c24baf1ba7df0406c3c5252bc11ed512 to your computer and use it in GitHub Desktop.
sort corpus_locations.csv | uniq -c | sort -nr > corpus_locations_count.tsv
apt-get install xpath
for file in $(find . -iname '*.xml')
do
echo $file
xpath -e '//note/text()' $file > $file.txt
done
for file in $(find . -iname '*.ner')
do
echo $file
awk -F 'START:location>|<END' '{print $2}' $file >> corpus_locations.csv
done
for file in $(find . -iname '*.txt')
do
echo $file
/root/apache-opennlp-1.5.3/bin/opennlp TokenNameFinder /root/apache-opennlp-1.5.3/bin/en-ner-location.bin < $file > $file.ner
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment