Skip to content

Instantly share code, notes, and snippets.

@rshowcase
Last active November 20, 2023 13:35
Show Gist options
  • Save rshowcase/cc30fac759195e0d014c to your computer and use it in GitHub Desktop.
Save rshowcase/cc30fac759195e0d014c to your computer and use it in GitHub Desktop.
Fama-MacBeth Entire Procedure
# In my portfolio, I show how the popular Fama-MacBeth (1973) procedure is constructed in R.
# The procedure is used to estimate risk premia and determine the validity of asset pricing models.
# Google shows that the original paper has currently over 9000 citations (Mar 2015), making the methodology one of the most
# influential papers in asset pricing studies. It's used by thousands of finance students each year, but I'm unable to find a
# complete description of it from the web.
#
# While the methodology is not statistically too complex (although the different standard errors can get complex),
# it can pose some serious data management challenges to students and researchers.
#
# The goal of the methodology is to estimate risk premia in the financial markets. While newer, more sophisticated methods for
# estimating risk premia exist, FM has remained popular due to its intuition. The methodology can be summarized as follows:
#
# 1. Construct risk factor return series
# - A risk factor return series is constructed from a zero-investment portfolio, where high-risk assets are held and
# financed by short-selling low-risk assets: it is up to the student or researcher to explain the criterion behind a risk factor
# - The return series is thus a differential of two series: the returns of the long portfolio minus the returns of the short portfolio
# - The portfolios don’t need to be equal-weighted, although they usually are in classic asset pricing studies.
# But hedge-fund originated strategies can use more sophisticated weighting, such as zero-beta: recent example.
# 2. Estimate factor loadings (FM 1st stage)
# - Betas (=factor loadings) are estimated for each asset in a linear time series regression
# - Thus, we need to specify what we consider a “correct” beta: remember, betas vary over time and they are always
# estimated with error
# - I demonstrate the ex-ante and ex-post testing approaches with individual assets, as explained in more detail in Ang, Liu & Schwartz (2010).
# 3. Estimate risk premia (FM 2nd stage)
# - Be careful not to confuse this stage with Fama-French (1993).
# - The main idea is that beta estimates should explain individual asset returns
# - This is tested by estimating multiple cross-sectional regression across asset returns
# - Finally, average estimates are reported
# - This step is pre-programmed in 3rd-party packages
#
#
# Start with some useful functions to help import data
#
# Point to dataset
dataset <- read.csv(file.choose())
# Replace commas with dots (R recognizes only dots as decimal separators)
dots <- sapply(commas, function(x) {as.numeric(gsub(",", ".", as.character(x)))})
#
# Now start with the data import
#
# Load libraries
library(quantmod)
library(PerformanceAnalytics)
library(TTR)
library(repmis)
library(zoo)
library(sandwich)
library(lmtest)
# Read MSCI Equity index prices from my Dropbox
# Notice that the dataset is converted from an xlsx into csv, using ";" as separator
data <- source_DropboxData(file = "data.csv", key = "ocbkfvedc3aola8", sep = ";", header = TRUE)
# Delete first column with non-recognized date format
prices <- data[, -1]
# The numbers contain spaces as thousand separators and R doesn't like this
prices <- sapply(prices, function(x) {as.numeric(gsub("\\s","", as.character(x)))})
# Basic descriptive functions
str(prices)
head(prices)
summary(prices)
dim(prices)
# Transform prices into returns, omit the first row
# Declare first the prices to be a time series object
prices <- ts(data=prices, frequency=12, start=c(1969, 12))
returns <- Return.calculate(prices)
returns <- na.omit(returns)
# Keep track of the MSCI World returns
world <- grep("world", colnames(returns))
# Keep track of rows (time)
returns.t <- nrow(returns)
# Risk-free rate: read straight from FRED database and transform into monthly returns for our time period
getSymbols("TB3MS", src="FRED")
rf <- TB3MS[paste("1970-02-01", "2014-12-01", sep="/")]
rf <- (1+(rf/100))^(1/12)-1
rfts <- ts(data=rf, frequency=12, start=c(1970, 1))
# Finally calculate the market return factor
rmrf <- returns[,world]-rfts
#
# Example of constructing a risk factor
#
# There’s an infinite number of ways to build risk factor returns and it’s up to the researcher to motivate her decision.
# I will focus here on a t,t (here 6,6) momentum strategy approximation (reforming the portfolio is done every six months and
# the assets are held for six months. However, the portfolio is rebalanced monthly and the factor is thus an approximation –
# compound returns in the momentum period are not taken into account) that is common in the asset pricing literature.
# t,t month momentum strategy implementation
# 6,6 momentum, equal-weighted portfolios, rebalancing done every six months
mo <- 6
# Keep track of time
momentum.t <- returns.t-mo
# Create a matrix of 6-month simple moving average returns
smamat <- apply(returns, 2, SMA, n=mo+1)
# Copy the returns of every mo until the reforming of the portfolio
for (i in seq(from=1, to=nrow(smamat), by=mo)) {
for (j in 1:mo-1) {
smamat[i+j,] <- smamat[i,]
}
}
# Then omit the first six NA's
smamat <- na.omit(smamat)
# Apply row-wise rank - higher return, higher rank
ranks <- t(apply(smamat, 1, rank))
# Define functions that assign assets into the highest and lowest quartiles
hiq <- function(hi) {
return(hi>quantile(ranks, c(.75)))
}
loq <- function(lo) {
return(lo<quantile(ranks, c(.25)))
}
# Assign the assets into quartiles
highret <- t(apply(ranks, 1, hiq))
lowret <- t(apply(ranks, 1, loq))
# Calculate returns for the high (winner) and low (loser) portfolios
ret <- returns[-c(1:mo),]
ret <- ts(data=ret, frequency=12, start=c(1970, 7))
highp <- ret*highret
lowp <- ret*lowret
highstrat <- rowSums(highp)/rowSums(highp != 0)
lowstrat <- rowSums(lowp)/rowSums(lowp != 0)
# Finally we get the factor WML return series (Winners-minus-Losers)
wml <- highstrat-lowstrat
# Combine the needed information into a matrix
rmrf <- rmrf[-c(1:mo),]
row <- c(seq(momentum.t))
ret <- ret[,-1]
rets <- cbind(row, wml, rmrf, ret)
#
# First stage
#
# Specify the parameters
int <- 12 # Estimation period interval ("stationarity period")
est <- 60 # Beta estimation period length
fact <- 2 # Number of factors in the model
# Keep track of time
fstage.t <- momentum.t-est
# Create an empty list for estimates
estimates <- list()
for(s in colnames(ret)) {
estimates[[s]] <- matrix(, nrow=fstage.t+mo, ncol=fact+1)
colnames(estimates[[s]]) <- c("alphas", "mktbetas", "factorbetas")
}
# Ex-ante portfolios
# Fill every int:th row with estimates
for(t in seq(from=0, to=fstage.t, by=int)) {
m t & row < t+est) # For a 3-factor model, add the factor into the equation
for(i in 1:(ncol(ret))) {
estimates[[i]][t+1, fact-1] <- coef(m)[fact-1, i]
estimates[[i]][t+1, fact] <- coef(m)[fact, i]
estimates[[i]][t+1, fact+1] <- coef(m)[fact+1, i]
# For a 3-factor model, add row: estimates[[i]][t+1, fact+2] <- coef(m)[fact+2, i]
}
}
# Fill in rest of the estimates
for(k in 1:ncol(ret)) {
estimates[[k]] <- na.locf(estimates[[k]])
}
# Align the return matrix (ex-ante)
assets <- ret[-c(1:(est-mo)),]
# Return matrix into vector
assets <- c(assets)
# Ready the data frame for 2nd stage
sstage <- do.call(rbind.data.frame, estimates)
sstage$time <- rep(seq(fstage.t+mo), times=ncol(ret))
sstage$id <- rep(colnames(ret), each=fstage.t+mo)
sstage$returns <- assets
#
# Second stage
#
# This section is pretty much identical to the example code available through Mitchell Petersen’s website.
# First, we can check that we’re doing the right estimation by using Petersen’s test data and results.
# Use custom clustering functions by Stockholm University's Mahmood Arai
source("http://people.su.se/~ma/clmcl.R")
# Read test data from Petersen's website
test <- read.table("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt", col.names = c("firmid", "year", "x", "y"))
head(test)
# Fit the model
fm <- lm(y ~ x, data=test)
# Compare the different standard errors
coeftest(fm) # OLS
coeftest(fm, vcov=vcovHC(fm, type="HC0")) # White
cl(test,fm, firmid) # Clustered by firm
cl(test,fm, year) # Clustered by year
mcl(test,fm, firmid, year) # Clustered by firm and year
# Next we do the same for our two-factor model.
#Fit the model
twof <- lm(returns ~ mktbetas + factorbetas, data=sstage)
# Compare the standard errors
coeftest(twof) # OLS
coeftest(twof, vcov=vcovHC(fm, type="HC0")) # White
cl(sstage,twof, firmid) # Clustered by firm
cl(sstage,twof, time) # Clustered by year
mcl(sstage,twof, firmid, time) # Clustered by firm and year
# And now we have estimated a two-factor model for market and momentum risk premia with N assets and T months.
@Jarno3
Copy link

Jarno3 commented Jan 24, 2018

Hey, where can I find the data you used?

@cbancil
Copy link

cbancil commented Mar 28, 2019

Hi, same question as above.
I installed your libraries, but running:
data <- source_DropboxData(file = "data.csv", key = "ocbkfvedc3aola8", sep = ";", header = TRUE)
gives the error:
Error in source_DropboxData(file = "data.csv", key = "ocbkfvedc3aola8", :
unused arguments (file = "data.csv", key = "ocbkfvedc3aola8", sep = ";", header = TRUE)

@vchernat
Copy link

vchernat commented Aug 6, 2019

Hi Tuomas,
Could you please share data files that drive this example?

@dkachel
Copy link

dkachel commented Nov 8, 2019

Hi
First of all, thanks a lot for sharing this code! I've a question regarding the first stage estimation: starting from line 188, the code for the actual estimation seems to be missing? Or am I missing something?

@julanal
Copy link

julanal commented Feb 19, 2021

Was anyone able to find the data?

@Hassanzaada
Copy link

Hi First of all, thanks a lot for sharing this code! I've a question regarding the first stage estimation: starting from line 188, the code for the actual estimation seems to be missing? Or am I missing something?

Hello dear, do you have this data sheet? used in this code?

@m-diniz
Copy link

m-diniz commented Apr 9, 2022

Hi guys,
You can import the dataset using this code:

mydata <- read.csv('http://dl.dropboxusercontent.com/s/ocbkfvedc3aola8/data.csv',
sep = ';', colClasses = c('character', rep('numeric', 19)))

Or alternatively:

mydata2 <-read_csv2('encurtador.com.br/lAPR0')

@R3bd
Copy link

R3bd commented Jul 16, 2022

Hello, Thanks for sharing your code with us. There are a couple of errors though. Lines 119-124 and 178-195. If anyone fixed them would you share with us the revisions you made? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment