Created
December 13, 2011 23:38
-
-
Save fmarani/1474483 to your computer and use it in GitHub Desktop.
poor man's language classifier
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
echo "this is a text in engish written only to demonstrate the validity of this method in selecting the right language" > corpus_en | |
echo "questo è un testo in italiano scritto solamente per dimostrare la validita di questo metodo nel selezionare il linguaggio voluto" > corpus_it | |
echo "questa è una prova di testo per testare la versione italiana" > test | |
(echo `cat corpus_en test | gzip | wc -c` en; echo `cat corpus_it test | gzip | wc -c` it) | sort -n | head -1 | |
echo "this is a test for the english version"> test | |
(echo `cat corpus_en test | gzip | wc -c` en; echo `cat corpus_it test | gzip | wc -c` it) | sort -n | head -1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment