Created
April 2, 2015 21:54
-
-
Save cjdd3b/abbc811900d2ca64ee3e to your computer and use it in GitHub Desktop.
CSV-flattening code for Harsh's research
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import csv, os | |
# This chunk iterates through all of the csv files in a directory, turns them | |
# into 2-dimensional arrays (lists of lists), and puts all those arrays into | |
# a list called "tables" | |
tables = [] | |
# Loop over all files in the current directory (which is what "." means) | |
for f in os.listdir('.'): | |
# Skip non-CSV files by checking the file extension | |
if not f.split('.')[1] == 'csv': | |
continue | |
# Open the CSV, grab lines 7-2010 and add them to the tables list | |
with open(f, 'r') as csvfile: | |
reader = list(csv.reader(csvfile))[7:2010] | |
tables.append(reader) | |
# Now that the "tables" list contains the relevant parts of all the CSVs, | |
# we can stitch them together into another list, which we'll call "output" | |
output = [] | |
# This is weird syntax for a weird tool, but you can read about zip here: | |
# https://docs.python.org/2/library/functions.html#zip. It's basically a | |
# tool for smashing lists together. Also, we're feeding in a special type | |
# of argument, which is what the "*" is for. That's described here: | |
# https://docs.python.org/2/tutorial/controlflow.html#arbitrary-argument-lists | |
for row in zip(*tables): | |
# This appends each row together, skipping the first 5 items from each. | |
# It then adds the result to the output. Stolen from StackOverflow here: | |
# http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python | |
output.append([item for sublist in row for item in sublist[5:]]) | |
# Write the CSV, using the syntax from class | |
with open('test.csv', 'w') as csvfile: | |
my_writer = csv.writer(csvfile, delimiter=',') | |
my_writer.writerows(output) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment