Skip to content

Instantly share code, notes, and snippets.

@dardo82
Last active December 13, 2017 00:41
Show Gist options
  • Save dardo82/27969248e4b65793da632707af0338c5 to your computer and use it in GitHub Desktop.
Save dardo82/27969248e4b65793da632707af0338c5 to your computer and use it in GitHub Desktop.
Make words list
#!/bin/zsh
VB="https://dropbox.com/s/mkcyo53m15ktbnp/nuovovocabolariodibase.pdf"
HW=$(curl -s $VB | pdftotext -layout - - | gawk -v RS=', [0-9]?|\n' \
'/^[^A-Z]+$/{if($1~/^[a-z][^\.]+$/)print $1}' | sed -n '/-$/N; s/-\n//; 1!p')
WD="https://it.wiktionary.org/w/index.php?action=raw&title="; PT="parole.txt"
for w in $HW; do echo "$w|${${$(curl -s "$WD$w" | gawk -v RS='{|}' \
'/Link|Tab/{print $1; exit}')#*|}:-$w}"; done > $PT
WP=https://dumps.wikimedia.org/itwiki/latest/itwiki-latest-all-titles-in-ns0.gz
export GREP=$(which ggrep); WPT=$(curl -s $WP | zgrep -P '^(\pP?_?\w+_?\pP?)+$')
echo "a:_1\n-b_2\nc_+3\nd_4*\ne_5" | gawk -v re="1 c d 3" '{gsub(/ /,"|",re); \
split(re,w,"|"); if ($0~re) {gsub(/(\W|_)+/,"\n",$0); print $0 > w[1]".txt"}}'
#RE=$(gsed -En "/$A1|$A2|$A3|$A4|$A5/H;"'${x;s/ /||/g;s/\n//;s/\n/\&\&/g;p}'$PT)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment