-
-
Save rshowcase/cc30fac759195e0d014c to your computer and use it in GitHub Desktop.
# In my portfolio, I show how the popular Fama-MacBeth (1973) procedure is constructed in R. | |
# The procedure is used to estimate risk premia and determine the validity of asset pricing models. | |
# Google shows that the original paper has currently over 9000 citations (Mar 2015), making the methodology one of the most | |
# influential papers in asset pricing studies. It's used by thousands of finance students each year, but I'm unable to find a | |
# complete description of it from the web. | |
# | |
# While the methodology is not statistically too complex (although the different standard errors can get complex), | |
# it can pose some serious data management challenges to students and researchers. | |
# | |
# The goal of the methodology is to estimate risk premia in the financial markets. While newer, more sophisticated methods for | |
# estimating risk premia exist, FM has remained popular due to its intuition. The methodology can be summarized as follows: | |
# | |
# 1. Construct risk factor return series | |
# - A risk factor return series is constructed from a zero-investment portfolio, where high-risk assets are held and | |
# financed by short-selling low-risk assets: it is up to the student or researcher to explain the criterion behind a risk factor | |
# - The return series is thus a differential of two series: the returns of the long portfolio minus the returns of the short portfolio | |
# - The portfolios don’t need to be equal-weighted, although they usually are in classic asset pricing studies. | |
# But hedge-fund originated strategies can use more sophisticated weighting, such as zero-beta: recent example. | |
# 2. Estimate factor loadings (FM 1st stage) | |
# - Betas (=factor loadings) are estimated for each asset in a linear time series regression | |
# - Thus, we need to specify what we consider a “correct” beta: remember, betas vary over time and they are always | |
# estimated with error | |
# - I demonstrate the ex-ante and ex-post testing approaches with individual assets, as explained in more detail in Ang, Liu & Schwartz (2010). | |
# 3. Estimate risk premia (FM 2nd stage) | |
# - Be careful not to confuse this stage with Fama-French (1993). | |
# - The main idea is that beta estimates should explain individual asset returns | |
# - This is tested by estimating multiple cross-sectional regression across asset returns | |
# - Finally, average estimates are reported | |
# - This step is pre-programmed in 3rd-party packages | |
# | |
# | |
# Start with some useful functions to help import data | |
# | |
# Point to dataset | |
dataset <- read.csv(file.choose()) | |
# Replace commas with dots (R recognizes only dots as decimal separators) | |
dots <- sapply(commas, function(x) {as.numeric(gsub(",", ".", as.character(x)))}) | |
# | |
# Now start with the data import | |
# | |
# Load libraries | |
library(quantmod) | |
library(PerformanceAnalytics) | |
library(TTR) | |
library(repmis) | |
library(zoo) | |
library(sandwich) | |
library(lmtest) | |
# Read MSCI Equity index prices from my Dropbox | |
# Notice that the dataset is converted from an xlsx into csv, using ";" as separator | |
data <- source_DropboxData(file = "data.csv", key = "ocbkfvedc3aola8", sep = ";", header = TRUE) | |
# Delete first column with non-recognized date format | |
prices <- data[, -1] | |
# The numbers contain spaces as thousand separators and R doesn't like this | |
prices <- sapply(prices, function(x) {as.numeric(gsub("\\s","", as.character(x)))}) | |
# Basic descriptive functions | |
str(prices) | |
head(prices) | |
summary(prices) | |
dim(prices) | |
# Transform prices into returns, omit the first row | |
# Declare first the prices to be a time series object | |
prices <- ts(data=prices, frequency=12, start=c(1969, 12)) | |
returns <- Return.calculate(prices) | |
returns <- na.omit(returns) | |
# Keep track of the MSCI World returns | |
world <- grep("world", colnames(returns)) | |
# Keep track of rows (time) | |
returns.t <- nrow(returns) | |
# Risk-free rate: read straight from FRED database and transform into monthly returns for our time period | |
getSymbols("TB3MS", src="FRED") | |
rf <- TB3MS[paste("1970-02-01", "2014-12-01", sep="/")] | |
rf <- (1+(rf/100))^(1/12)-1 | |
rfts <- ts(data=rf, frequency=12, start=c(1970, 1)) | |
# Finally calculate the market return factor | |
rmrf <- returns[,world]-rfts | |
# | |
# Example of constructing a risk factor | |
# | |
# There’s an infinite number of ways to build risk factor returns and it’s up to the researcher to motivate her decision. | |
# I will focus here on a t,t (here 6,6) momentum strategy approximation (reforming the portfolio is done every six months and | |
# the assets are held for six months. However, the portfolio is rebalanced monthly and the factor is thus an approximation – | |
# compound returns in the momentum period are not taken into account) that is common in the asset pricing literature. | |
# t,t month momentum strategy implementation | |
# 6,6 momentum, equal-weighted portfolios, rebalancing done every six months | |
mo <- 6 | |
# Keep track of time | |
momentum.t <- returns.t-mo | |
# Create a matrix of 6-month simple moving average returns | |
smamat <- apply(returns, 2, SMA, n=mo+1) | |
# Copy the returns of every mo until the reforming of the portfolio | |
for (i in seq(from=1, to=nrow(smamat), by=mo)) { | |
for (j in 1:mo-1) { | |
smamat[i+j,] <- smamat[i,] | |
} | |
} | |
# Then omit the first six NA's | |
smamat <- na.omit(smamat) | |
# Apply row-wise rank - higher return, higher rank | |
ranks <- t(apply(smamat, 1, rank)) | |
# Define functions that assign assets into the highest and lowest quartiles | |
hiq <- function(hi) { | |
return(hi>quantile(ranks, c(.75))) | |
} | |
loq <- function(lo) { | |
return(lo<quantile(ranks, c(.25))) | |
} | |
# Assign the assets into quartiles | |
highret <- t(apply(ranks, 1, hiq)) | |
lowret <- t(apply(ranks, 1, loq)) | |
# Calculate returns for the high (winner) and low (loser) portfolios | |
ret <- returns[-c(1:mo),] | |
ret <- ts(data=ret, frequency=12, start=c(1970, 7)) | |
highp <- ret*highret | |
lowp <- ret*lowret | |
highstrat <- rowSums(highp)/rowSums(highp != 0) | |
lowstrat <- rowSums(lowp)/rowSums(lowp != 0) | |
# Finally we get the factor WML return series (Winners-minus-Losers) | |
wml <- highstrat-lowstrat | |
# Combine the needed information into a matrix | |
rmrf <- rmrf[-c(1:mo),] | |
row <- c(seq(momentum.t)) | |
ret <- ret[,-1] | |
rets <- cbind(row, wml, rmrf, ret) | |
# | |
# First stage | |
# | |
# Specify the parameters | |
int <- 12 # Estimation period interval ("stationarity period") | |
est <- 60 # Beta estimation period length | |
fact <- 2 # Number of factors in the model | |
# Keep track of time | |
fstage.t <- momentum.t-est | |
# Create an empty list for estimates | |
estimates <- list() | |
for(s in colnames(ret)) { | |
estimates[[s]] <- matrix(, nrow=fstage.t+mo, ncol=fact+1) | |
colnames(estimates[[s]]) <- c("alphas", "mktbetas", "factorbetas") | |
} | |
# Ex-ante portfolios | |
# Fill every int:th row with estimates | |
for(t in seq(from=0, to=fstage.t, by=int)) { | |
m t & row < t+est) # For a 3-factor model, add the factor into the equation | |
for(i in 1:(ncol(ret))) { | |
estimates[[i]][t+1, fact-1] <- coef(m)[fact-1, i] | |
estimates[[i]][t+1, fact] <- coef(m)[fact, i] | |
estimates[[i]][t+1, fact+1] <- coef(m)[fact+1, i] | |
# For a 3-factor model, add row: estimates[[i]][t+1, fact+2] <- coef(m)[fact+2, i] | |
} | |
} | |
# Fill in rest of the estimates | |
for(k in 1:ncol(ret)) { | |
estimates[[k]] <- na.locf(estimates[[k]]) | |
} | |
# Align the return matrix (ex-ante) | |
assets <- ret[-c(1:(est-mo)),] | |
# Return matrix into vector | |
assets <- c(assets) | |
# Ready the data frame for 2nd stage | |
sstage <- do.call(rbind.data.frame, estimates) | |
sstage$time <- rep(seq(fstage.t+mo), times=ncol(ret)) | |
sstage$id <- rep(colnames(ret), each=fstage.t+mo) | |
sstage$returns <- assets | |
# | |
# Second stage | |
# | |
# This section is pretty much identical to the example code available through Mitchell Petersen’s website. | |
# First, we can check that we’re doing the right estimation by using Petersen’s test data and results. | |
# Use custom clustering functions by Stockholm University's Mahmood Arai | |
source("http://people.su.se/~ma/clmcl.R") | |
# Read test data from Petersen's website | |
test <- read.table("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt", col.names = c("firmid", "year", "x", "y")) | |
head(test) | |
# Fit the model | |
fm <- lm(y ~ x, data=test) | |
# Compare the different standard errors | |
coeftest(fm) # OLS | |
coeftest(fm, vcov=vcovHC(fm, type="HC0")) # White | |
cl(test,fm, firmid) # Clustered by firm | |
cl(test,fm, year) # Clustered by year | |
mcl(test,fm, firmid, year) # Clustered by firm and year | |
# Next we do the same for our two-factor model. | |
#Fit the model | |
twof <- lm(returns ~ mktbetas + factorbetas, data=sstage) | |
# Compare the standard errors | |
coeftest(twof) # OLS | |
coeftest(twof, vcov=vcovHC(fm, type="HC0")) # White | |
cl(sstage,twof, firmid) # Clustered by firm | |
cl(sstage,twof, time) # Clustered by year | |
mcl(sstage,twof, firmid, time) # Clustered by firm and year | |
# And now we have estimated a two-factor model for market and momentum risk premia with N assets and T months. |
Hi, same question as above.
I installed your libraries, but running:
data <- source_DropboxData(file = "data.csv", key = "ocbkfvedc3aola8", sep = ";", header = TRUE)
gives the error:
Error in source_DropboxData(file = "data.csv", key = "ocbkfvedc3aola8", :
unused arguments (file = "data.csv", key = "ocbkfvedc3aola8", sep = ";", header = TRUE)
Hi Tuomas,
Could you please share data files that drive this example?
Hi
First of all, thanks a lot for sharing this code! I've a question regarding the first stage estimation: starting from line 188, the code for the actual estimation seems to be missing? Or am I missing something?
Was anyone able to find the data?
Hi First of all, thanks a lot for sharing this code! I've a question regarding the first stage estimation: starting from line 188, the code for the actual estimation seems to be missing? Or am I missing something?
Hello dear, do you have this data sheet? used in this code?
Hi guys,
You can import the dataset using this code:
mydata <- read.csv('http://dl.dropboxusercontent.com/s/ocbkfvedc3aola8/data.csv',
sep = ';', colClasses = c('character', rep('numeric', 19)))
Or alternatively:
mydata2 <-read_csv2('encurtador.com.br/lAPR0')
Hello, Thanks for sharing your code with us. There are a couple of errors though. Lines 119-124 and 178-195. If anyone fixed them would you share with us the revisions you made? Thanks!
Hey, where can I find the data you used?