Learn how to use Python data structures, execution control statements, and DataFrames to manipulate financial data. Work with pandas, using data from the Federal Reserve Bank, to explore national economic trends—an essential part of understanding investment strategies. Calculate risk based on stock price data, and display this data in easy to read plots.
By Kennedy Behrman, Data Engineer, Author, Founder
Create and manipulate Python datetime objects to help you identify key financial events, such as Black Friday. Store and efficiently look up items using Python dictionaries.
- Datetimes
- Datetime from string
- String from datetime
- Datetime attributes
- Comparing datetimes
- Subtraction returns a timedelta object
- Creating relative datetimes
- Dictionaries
- Creating, adding to and deleting from
from datetime import datetime
black_monday = datetime(1987, 10, 19)
datetime.now()
black_monday_str = "Monday, October 19, 1987. 9:30 am"
format_str = "%A, %B %d, %Y. %I:%M %p"
datetime.datetime.strptime(black_monday_str, format_str)
dt.strftime(format_string)
from datetime import timedelta
offset = timedelta(weeks = 1)
cur_week = last_week + timedelta(weeks=1)
Use boolean logic to determine truth and use comparison and equality operators to control execution in if-statements and loops.
- Comparison operators
- Comparing datetimes
- Comparing dictionaries
- Comparing different types
- Boolean operators
- Object evaluation
AND
andOR
operators- Short circuit
NOT
operator- Returning objects
IF
statementsfor
andwhile
loops- Skipping with continue
- Stopping with break
Create and access DataFrames with pandas using financial data from other data structures, including Dicts, lists, and CSV files. Aggregate data across rows or columns, calculate averages, and extend your data using functions.
- Creating Pandas DataFrame
- From dictionary, list of dictionaries, or list of lists
- Reading data
- Accessing Data
- Access column using brackets
- Access column using dot-syntax
- Access multiple columns
- Access rows using brackets
loc()
andiloc()
- Columns with
loc()
- Setting a single value, multiple values or multiple columns
- Aggregating and summarizing
- DataFrame methods
.count()
,min()
,max()
,first()
,last()
,sum()
,prid()
,max()
.mean()
,media()
,std()
,var()
- Axis: default
axis=0
,axis=rows
, oraxis=1
,axis=columns
- DataFrame methods
- Extending and manipulating data
- Adding and removing columns
- Operations on DataFrames
apply()
import pandas as pd
pd.DataFrame()
data = {'Bank Code': ['BA', 'AAD', 'BA'],
'Account#': ['ajfdk2', '1234nmk', 'mm3d90'],
'Balance':[1222.00, 390789.11, 13.02]}
df = pd.DataFrame(data=data)
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
columns = ['Bank Code', 'Account#', 'Balance']
df = pd.DataFrame(data=data, columns=columns)
df = pd.read_csv('/data/daily/transactions.csv', sep='|')
accounts['Balance']
accounts.Balance
accounts[['Bank Code', 'Account#']]
accounts[0:2]
accounts[[True, False, True]]
accounts.loc['b']
accounts.loc[['a','c']]
df.loc[[True, False, True]]
accounts.loc['a':'c', ['Balance','Account#']]
accounts.loc['a':'c',[True,False,True]]
accounts.loc['a':'c','Bank Code':'Balance']
accounts.iloc[0:2, [0,2]]
accounts.loc['a', 'Balance'] = 0
accounts.iloc[:2, 1:] = 'NA'
df.count()
df.sum(axis=1)
df.loc[:,'AAD'].max()
df.iloc[0].min()
pce['PCE'] = pce['PCDG'] + pce['PCND'] + pce['PCESV']
pce.drop(columns=['PCDG', 'PCND', 'PCESV'], axis=1, inplace=True)
pce.append(new_row)
all_rows = [row1, row2, row3, pce]
pd.concat(all_rows)
gdp['GDP'] = gdp.apply(np.sum, axis=1)
Working with real-world NASDAQ stock data to interpret new data, create masks to filter data, and visualize your findings with plots.
- Peeking at data with
head()
,tail()
, anddescribe()
- Filtering data
- Column comparison
- Masking by symbol
- Pandas boolean operators
- Combining conditions
- Plotting data
- Plot types: bar, barh, line, scatter, box, histogram, kde
import pandas as pd
aapl.head(3)
aapl.describe()
aapl.describe(include='object')
aapl.describe(include=['float', 'object'])
aapl.describe(include='all')
aapl.describe(percentiles=[.1, .5, .9])
aapl.describe(exclude='float')
prices.High > 2160
mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]
mask_prices = prices['Symbol'] != 'AMZN'
mask_date = historical_highs['Date'] > datetime(2020, 4, 1)
mask_amzn = mask_prices & mask_date
prices.loc[mask_amzn]
my_dataframe.plot()
exxon.set_index('Date', inplace=True)
exxon.plot(x='Date', y='High', rot=90, title='Exxon Stock Price')
exxon.plot(y='High',kind='hist')
Select the Balance for October 2nd
print(ledger.loc['',''])
Select the Balance for October 3rd
print(ledger.['','____'])