Last active
August 29, 2018 21:07
-
-
Save ryanpitts/0dbd6bf254ce84e778b654c9174e845e to your computer and use it in GitHub Desktop.
quick csv grouping script in python re: https://twitter.com/rachelwalexande/status/1034236011830697986
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import csv | |
from collections import OrderedDict | |
CLEANED_DICT = OrderedDict() | |
METADATA_FIELDS = ['districtID','district','schoolID','school'] | |
VALUE_FIELDS = [] | |
with open('schools.csv') as csvfile: | |
reader = csv.DictReader(csvfile) | |
for row in reader: | |
if (row['schoolID'] not in CLEANED_DICT.keys()): | |
# we don't have a row for this school yet, | |
# so create it with the basic metadata | |
CLEANED_DICT[row['schoolID']] = { | |
k:row[k] for k in METADATA_FIELDS | |
} | |
# create a column designator based on the student group in this csv row | |
# e.g. 'all_students' or 'economically_disadvantaged' | |
value = row['students'].lower().replace(' ','_').replace('/','_') | |
value_types = ['total','pct'] | |
for value_type in value_types: | |
# make a csv-friendly header for this column, e.g. 'total_all_students' | |
column_header = value_type + '_' + value | |
# add it to our running list of fields for the output csv | |
if column_header not in VALUE_FIELDS: VALUE_FIELDS.append(column_header) | |
# add value for that group of students to this school's row | |
CLEANED_DICT[row['schoolID']].update({column_header: row[value_type]}) | |
# create the output csv | |
with open('schools_cleaned.csv', 'w') as csvfile: | |
fieldnames = METADATA_FIELDS + VALUE_FIELDS | |
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator='\n') | |
writer.writeheader() | |
for k,v in CLEANED_DICT.items(): | |
writer.writerow(v) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
districtID | district | schoolID | school | students | total | pct | |
---|---|---|---|---|---|---|---|
2142 | Salem-Keiser | 728 | Auburn Elementary School | All Students | 648 | 29.2 | |
2142 | Salem-Keiser | 728 | Auburn Elementary School | Economically Disadvantaged | 648 | 29.2 | |
2142 | Salem-Keiser | 728 | Auburn Elementary School | English Learners | 409 | 25.9 | |
2142 | Salem-Keiser | 728 | Auburn Elementary School | Hispanic/Latino | 416 | 27.6 | |
2142 | Salem-Keiser | 5066 | Battle Creek Elementary School | All Students | 568 | 20.4 | |
2142 | Salem-Keiser | 5066 | Battle Creek Elementary School | Economically Disadvantaged | 267 | 27 | |
2142 | Salem-Keiser | 5066 | Battle Creek Elementary School | English Learners | 30 | 16.7 | |
2142 | Salem-Keiser | 5066 | Battle Creek Elementary School | Hispanic/Latino | 59 | 23.7 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
districtID | district | schoolID | school | total_all_students | pct_all_students | total_economically_disadvantaged | pct_economically_disadvantaged | total_english_learners | pct_english_learners | total_hispanic_latino | pct_hispanic_latino | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2142 | Salem-Keiser | 728 | Auburn Elementary School | 648 | 29.2 | 648 | 29.2 | 409 | 25.9 | 416 | 27.6 | |
2142 | Salem-Keiser | 5066 | Battle Creek Elementary School | 568 | 20.4 | 267 | 27 | 30 | 16.7 | 59 | 23.7 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment