Last active
November 10, 2022 09:55
-
-
Save steezeburger/98114746b2e4c5fa1ad1 to your computer and use it in GitHub Desktop.
Bash script for splitting large CSV files into 100 lines while keeping the header.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
FILENAME=file-to-split.csv | |
HDR=$(head -1 ${FILENAME}) | |
split -l 100 ${FILENAME} xyz | |
n=1 | |
for f in xyz* | |
do | |
if [[ ${n} -ne 1 ]]; then | |
echo ${HDR} > part-${n}-${FILENAME}.csv | |
fi | |
cat ${f} >> part-${n}-${FILENAME}.csv | |
rm ${f} | |
((n++)) | |
done |
Also, there's an error in line 9: missing .csv
at the end.
@arobinski Thanks for catching those errors! I've updated the script.
Thanks for posting this.. save me some time.
Thanks.
A couple of improvements can be done though.
- The first set takes including the header so the data count always stays as n-1 for the first one.
- adding the extension on lines 9 and 11 makes doubled up when writing the files.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This writes the header twice in the first file.