Skip to content

Instantly share code, notes, and snippets.

@nbroad1881
Created April 24, 2020 21:28
Show Gist options
  • Save nbroad1881/163ffa24f0bef7c7586ffc2e17480e07 to your computer and use it in GitHub Desktop.
Save nbroad1881/163ffa24f0bef7c7586ffc2e17480e07 to your computer and use it in GitHub Desktop.
If there is a massive corpus in a single file, this will break it up by number of lines. Also gets list of filenames
!split -l 250000 text_file.txt smaller_
### split [options] filename prefix
### -l linenumber
### -b bytes
import glob
file_list = glob.glob("smaller_*")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment