Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save piegu/6ea717d2aa806ebb0845227c79ba1694 to your computer and use it in GitHub Desktop.
Save piegu/6ea717d2aa806ebb0845227c79ba1694 to your computer and use it in GitHub Desktop.
get txt and csv files of articles of Byte-Level-BPE_universal_tokenizer_but.ipynb
# Create text and csv files of Wikipedia in Portuguese
dest = path_data/'docs'
# Text file
get_one_clean_file(dest,lang)
# csv file
get_one_clean_csv_file(dest,lang)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment