Skip to content

Instantly share code, notes, and snippets.

@gaphex
Created May 9, 2019 14:54
Show Gist options
  • Save gaphex/17fc043313c5a7231d0901a3ca66dec3 to your computer and use it in GitHub Desktop.
Save gaphex/17fc043313c5a7231d0901a3ca66dec3 to your computer and use it in GitHub Desktop.
Truncating OPUS dataset
DEMO_MODE = True #@param {type:"boolean"}
if DEMO_MODE:
CORPUS_SIZE = 1000000
else:
CORPUS_SIZE = 100000000 #@param {type: "integer"}
!(head -n $CORPUS_SIZE dataset.txt) > subdataset.txt
!mv subdataset.txt dataset.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment