Last active
May 31, 2023 10:57
-
-
Save wpm/e4f4001e6252264dec28 to your computer and use it in GitHub Desktop.
Pandas multi-table join
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas | |
""" | |
Join an arbitrary number of data frames, using a multi-index label for each data frame. | |
For example say you have three data frames each of which lists the classroom and | |
number of students a teacher has in a given period. | |
Classroom Students | |
Teacher | |
Mary 53A 19 | |
John 99B 25 | |
You want to combine them into a single data frame with a top level index indicating the period. | |
Period 1 Period 2 Period 3 | |
Classroom Students Classroom Students Classroom Students | |
Teacher | |
Mary 53A 19 18B 27 36D 12 | |
John 99B 25 23C 15 15B 30 | |
Do this by placing each input data frame under a top level index, and then joining all the tables | |
using the reduce function. | |
""" | |
def merge_reduce(fs, label): | |
for i, f in enumerate(fs, 1): | |
f.columns = pandas.MultiIndex.from_tuples([("%s %d" % (label, i), c) for c in f.columns]) | |
return reduce(lambda m, f: m.join(f), fs) | |
p1 = pandas.DataFrame( | |
data={"Teacher": ["Mary", 'John'], "Classroom": ['53A', '99B'], "Students": [19, 25]}).set_index("Teacher") | |
p2 = pandas.DataFrame( | |
data={"Teacher": ["Mary", 'John'], "Classroom": ['18B', '23C'], "Students": [27, 15]}).set_index("Teacher") | |
p3 = pandas.DataFrame( | |
data={"Teacher": ["Mary", 'John'], "Classroom": ['36D', '15B'], "Students": [12, 30]}).set_index("Teacher") | |
m = merge_reduce([p1, p2, p3], "Period") | |
print(m) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment