Skip to content

Instantly share code, notes, and snippets.

Last active August 29, 2015 14:20
Show Gist options
  • Save soodoku/80bd41b6fc1377591727 to your computer and use it in GitHub Desktop.
Save soodoku/80bd41b6fc1377591727 to your computer and use it in GitHub Desktop.
Salvage Corrupted CSV
What does it do?
Goes through a corrupted csv sequentially and outputs rows that are clean.
Also outputs, total n, total corrupted n
@author: Gaurav Sood
Run: python input_csv output_csv
import sys
import csv
if len(sys.argv) < 2:
print("Usage: %s <input CSV> [<output CSV>]" % (sys.argv[0]))
o = None
if len(sys.argv) > 2:
o = open(sys.argv[2], 'wb')
f = open(sys.argv[1])
reader = csv.reader(f)
if o is not None:
writer = csv.writer(o)
ncols = 0
errors = 0
for i, r in enumerate(reader):
if i == 0:
ncols = len(r)
print("Number of column: %d" % ncols)
if o is not None:
if len(r) != ncols:
print("WARN: row #%d is corrupted" % (i))
errors += 1
elif o is not None:
if o is not None:
print("Total: %d rows, Errors: %d rows" % (i, errors))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment