-
Star
(137)
You must be signed in to star a gist -
Fork
(64)
You must be signed in to fork a gist
-
-
Save jrivero/1085501 to your computer and use it in GitHub Desktop.
A Python CSV splitter
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
def split(filehandler, delimiter=',', row_limit=10000, | |
output_name_template='output_%s.csv', output_path='.', keep_headers=True): | |
""" | |
Splits a CSV file into multiple pieces. | |
A quick bastardization of the Python CSV library. | |
Arguments: | |
`row_limit`: The number of rows you want in each output file. 10,000 by default. | |
`output_name_template`: A %s-style template for the numbered output files. | |
`output_path`: Where to stick the output files. | |
`keep_headers`: Whether or not to print the headers in each output file. | |
Example usage: | |
>> from toolbox import csv_splitter; | |
>> csv_splitter.split(open('/home/ben/input.csv', 'r')); | |
""" | |
import csv | |
reader = csv.reader(filehandler, delimiter=delimiter) | |
current_piece = 1 | |
current_out_path = os.path.join( | |
output_path, | |
output_name_template % current_piece | |
) | |
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter) | |
current_limit = row_limit | |
if keep_headers: | |
headers = reader.next() | |
current_out_writer.writerow(headers) | |
for i, row in enumerate(reader): | |
if i + 1 > current_limit: | |
current_piece += 1 | |
current_limit = row_limit * current_piece | |
current_out_path = os.path.join( | |
output_path, | |
output_name_template % current_piece | |
) | |
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter) | |
if keep_headers: | |
current_out_writer.writerow(headers) | |
current_out_writer.writerow(row) |
I am sorry about this, I am new to the field, but where I should specify the file I want to split into smaller files? at which part of the code?
add newline=‘’
in open()
to avoid blank row
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@alternateaccounts You can test this tool with your 4.5gb file?
https://github.com/BurntSushi/xsv
Thank you by the mention in your project