Skip to content

Instantly share code, notes, and snippets.

@chebyte
Created December 24, 2011 06:31
Show Gist options
  • Save chebyte/1516567 to your computer and use it in GitHub Desktop.
Save chebyte/1516567 to your computer and use it in GitHub Desktop.
split big csv into multiple smaller files
tmp_dir = "#{RAILS_ROOT}/tmp/"
original_file = "file.csv"
#create temporary dir
sh "mkdir #{tmp_dir}"
# create a temporary file containing the header without
# the content:
sh "head -n 1 #{original_file} > #{tmp_dir}/header.csv"
# create a temporary file containing the content without
# the header:
sh "tail +2 #{original_file} > #{tmp_dir}/content.csv"
# split the content file into multiple files of 10000 lines each:
sh "split -l 10000 #{tmp_dir}/content.csv #{tmp_dir}/prefix_"
# loop through the new split files, adding the header
# and a '.csv' extension:
sh "for f in #{tmp_dir}/prefix_*; do cat #{tmp_dir}/header.csv $f > $f.csv; rm $f; done;"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment