All of the diacritic characters(czech, slovak, etc languages) are replaced with their analogues from the latin abc. š -> s; ě -> e
So you don't have to worry about perfect character matching.
perl learn-languages.pl some-path.txt some-other-path/file.txt
Mine files are written in the following format