Merging DataFrames with pandas.md
Last active
April 24, 2019 18:08
-
-
Save JonathanLoscalzo/d5d6edf3ef1c645ee1156bf930245ff3 to your computer and use it in GitHub Desktop.
Merging DataFrames with pandas
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Concatenating horizontally to get MultiIndexed columns | |
| It is also possible to construct a DataFrame with hierarchically indexed columns. For this exercise, you'll start with pandas imported and a list of three DataFrames called dataframes. All three DataFrames contain 'Company', 'Product', and 'Units' columns with a 'Date' column as the index pertaining to sales transactions during the month of February, 2015. The first DataFrame describes Hardware transactions, the second describes Software transactions, and the third, Service transactions. | |
| Your task is to concatenate the DataFrames horizontally and to create a MultiIndex on the columns. From there, you can summarize the resulting DataFrame and slice some information from it. | |
| """ | |
| february = pd.concat(dataframes, keys=['Hardware', 'Software', 'Service'], axis=1) | |
| # Print february.info() | |
| print(february.info()) | |
| # Assign pd.IndexSlice: idx | |
| idx = pd.IndexSlice | |
| # Create the slice: slice_2_8 | |
| slice_2_8 = february.loc['Feb. 2, 2015':'Feb. 8, 2015', idx[:, 'Company']] | |
| # Print slice_2_8 | |
| print(slice_2_8) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| In this exercise, stock prices in US Dollars for the S&P 500 in 2015 have been obtained from Yahoo Finance. The files sp500.csv for sp500 and exchange.csv for the exchange rates are both provided to you. | |
| Using the daily exchange rate to Pounds Sterling, your task is to convert both the Open and Close column prices. | |
| """" | |
| # Import pandas | |
| import pandas as pd | |
| # Read 'sp500.csv' into a DataFrame: sp500 | |
| sp500 = pd.read_csv('sp500.csv', parse_dates=True, index_col='Date') | |
| # Read 'exchange.csv' into a DataFrame: exchange | |
| exchange = pd.read_csv('exchange.csv', parse_dates=True, index_col='Date') | |
| # Subset 'Open' & 'Close' columns from sp500: dollars | |
| dollars = sp500.loc[:, ['Open', 'Close']] | |
| # Print the head of dollars | |
| print(dollars.head()) | |
| # Convert dollars to pounds: pounds | |
| pounds = dollars.multiply(exchange['GBP/USD'], axis='rows') | |
| # Print the head of pounds | |
| print(pounds.head()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Perform the first ordered merge: tx_weather | |
| tx_weather = pd.merge_ordered(austin, houston) | |
| # Print tx_weather | |
| print(tx_weather) | |
| # Perform the second ordered merge: tx_weather_suff | |
| tx_weather_suff = pd.merge_ordered(austin, houston, on='date', suffixes=['_aus', '_hus']) | |
| # Print tx_weather_suff | |
| print(tx_weather_suff) | |
| # Perform the third ordered merge: tx_weather_ffill | |
| tx_weather_ffill = pd.merge_ordered(austin, houston, on='date', suffixes=['_aus', '_hus'], fill_method='ffill') | |
| # Print tx_weather_ffill | |
| print(tx_weather_ffill) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Perform the first merge: merge_default | |
| merge_default = pd.merge(sales_and_managers, revenue_and_sales) | |
| # Print merge_default | |
| print(merge_default) | |
| # Perform the second merge: merge_outer | |
| merge_outer = pd.merge(sales_and_managers, revenue_and_sales, how='outer') | |
| # Print merge_outer | |
| print(merge_outer) | |
| # Perform the third merge: merge_outer_on | |
| merge_outer_on = pd.merge(sales_and_managers, revenue_and_sales, on=['city','state'], how='outer') | |
| # Merge revenue and sales: revenue_and_sales | |
| revenue_and_sales = pd.merge(revenue, sales, how='right', on=['city', 'state']) | |
| # Print revenue_and_sales | |
| print(revenue_and_sales) | |
| # Merge sales and managers: sales_and_managers | |
| sales_and_managers = pd.merge(sales, managers, how='left', right_on=['branch', 'state'], left_on=['city', 'state']) | |
| # Print sales_and_managers | |
| print(sales_and_managers) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Resample and tidy china: china_annual | |
| china_annual = china.resample('A').last().pct_change(10).dropna() | |
| # Resample and tidy us: us_annual | |
| us_annual = us.resample('A').last().pct_change(10).dropna() | |
| # Concatenate china_annual and us_annual: gdp | |
| gdp = pd.concat([china_annual, us_annual], axis=1, join='inner') | |
| # Resample gdp and print | |
| print(gdp.resample('10A').last()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| This exercise picks up where the last ended (again using The Guardian's Olympic medal dataset). | |
| You are provided with the MultiIndexed DataFrame as produced at the end of the preceding exercise. Your task is to sort the DataFrame and to use the pd.IndexSlice to extract specific slices. Check out this exercise from Manipulating DataFrames with pandas to refresh your memory on how to deal with MultiIndexed DataFrames. | |
| pandas has been imported for you as pd and the DataFrame medals is already in your namespace. | |
| """ | |
| # Sort the entries of medals: medals_sorted | |
| medals_sorted = medals.sort_index(level=0) | |
| # Print the number of Bronze medals won by Germany | |
| print(medals_sorted.loc[('bronze','Germany')]) | |
| # Print data about silver medals | |
| print(medals_sorted.loc['silver']) | |
| # Create alias for pd.IndexSlice: idx | |
| idx = pd.IndexSlice | |
| # Print all the data on medals won by the United Kingdom | |
| print(medals_sorted.loc[idx[:,'United Kingdom'], :]) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment