MCMC methods can be used to estimate linear regression models. In what follows, we derive the distributional forms and algorithms associated with two Gibbs sampling approaches: the full conditionals approach and the composition method.
The dataset is available here and represents refrigerator price as a function of available features. Our objective is to estimate PRICE as a function of ECOST, FSIZE, SHELVES and FEATURES, where:
- PRICE: Response, cost of refrigerator.
- FSIZE: Size of freezer compartment.
- ECOST: Annual energy cost to operate refrigerator.
- SHELVES: Number of shelves.
- FEATURES: Number of features.
First determine the joint kernel, which is the product of the likelihood and all prior distributions. The joint kernel will be proportional to the posterior distribution:
The likelihood,
For the parameter vector
Since the prior distribution for the parameter vector
Note that posterior distribution is asymptotically equivalent to the expression for the model likelihood.
Gibbs sampling requires identifying the full conditional distribution for each parameter, holding all other parameters constant.
To find the full conditional distribution for
After distributing the transpose, removing multiplicative constants and performing a bit of reorganization, we arrive at:
Upon completing the square, we find the distribution for
For
where
-
Establish starting values for
$\beta_{p}$ and$\sigma^{2}$ . We use$\beta^{(0)} = \hat\beta_{ols}$ and$\hat \sigma_{0}^{2} = \mathrm{SSR} / (n - p)$ -
Sample
$\beta_{p}$ from multivariate normal distribution with$\sigma^{2}$ fixed. -
Sample
$\sigma^{2}$ from inverse gamma distribution with$\beta_{p}$ fixed.
In the code below, we run the sampler for 10,000 iterations, discarding the first 1,000 samples.
"""
The full conditionals Gibbs sampling method.
PRICE ~ "ECOST" + "FSIZE" + "SHELVES" + "FEATURES"
"""
from math import sqrt
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats
from scipy.linalg import cholesky
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', 1000)
pd.set_option('display.width', 10000)
np.set_printoptions(suppress=True)
%matplotlib inline
DATA_URL = "https://gist.github.com/jtrive84/96757393423b2599c7d5da361fdf024b/raw/82000835b7c3b70dcf21a9ab43c59db700b78136/Refrigerator.csv"
df = pd.read_csv(DATA_URL)
variates = ["ECOST", "FSIZE", "SHELVES", "FEATURES"]
X = df[variates].values
y = df["PRICE"].values.reshape(-1, 1)
X = np.concatenate([np.ones(y.size).reshape(-1, 1), X], axis=1)
n, p = X.shape
M = 10000
burnin = 1000
prng = np.random.RandomState(516)
# Initialize arrays to hold posterior samples.
betas, sigma2 = np.zeros([M, p]), np.ones(M)
# Initialize parameter arrays and compute covariance matrix.
b_ols = np.linalg.inv(X.T @ X) @ X.T @ y
V = np.linalg.inv(X.T @ X)
sigma2[0] = (y - X @ b_ols).T @ (y - X @ b_ols) / (n - p)
betas[0, :] = b_ols.T
# Gibbs sampling from full conditionals. At each iteration, p independent
# standard normal random variates are sampled, which are transformed into
# a draw from a multivariate normal density with mean betas[i, :] and
# covariance V.
for ii in range(1, M):
# Sample from full conditional distribution for betas.
betas[ii,:] = b_ols.T + prng.randn(p).reshape(1, -1) @ cholesky(sigma2[ii - 1] * V)
# Sample from full conditional distribution for variance.
sigma2[ii] = stats.invgamma.rvs(
a=.50 * n,
scale=.50 * ((y - X @ betas[ii,:].reshape(-1, 1)).T @ (y - X @ betas[ii,:].reshape(-1, 1))).item(),
random_state=prng
)
# Remove burnin samples.
betas = betas[burnin:,:]
sigma2 = sigma2[burnin:]
Traceplots can be produced to visualize the variability of samples across iterations:
"""
Combine beta and sigma2 arrays to create traceplots.
"""
varnames = ["intercept"] + variates + ["sigma2"]
Xall1 = np.hstack([betas, sigma2.reshape(-1, 1)])
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(10, 5), tight_layout=True)
indices = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
for indx, (varname, (ii, jj)) in enumerate(zip(varnames, indices)):
ax[ii, jj].set_title(varname, color="#000000", loc="center", fontsize=8)
ax[ii, jj].plot(Xall1[:,indx], color="#000000", linewidth=.25, linestyle="-")
ax[ii, jj].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}"))
ax[ii, jj].set_xlabel("")
ax[ii, jj].set_ylabel("")
ax[ii, jj].tick_params(axis="x", which="major", direction='in', labelsize=7)
ax[ii, jj].tick_params(axis="x", which="minor", direction='in', labelsize=7)
ax[ii, jj].tick_params(axis="y", which="major", direction='in', labelsize=7)
ax[ii, jj].tick_params(axis="y", which="minor", direction='in', labelsize=7)
ax[ii, jj].xaxis.set_ticks_position("none")
ax[ii, jj].yaxis.set_ticks_position("none")
ax[ii, jj].grid(True)
ax[ii, jj].set_axisbelow(True)
plt.suptitle("Traceplots: Full Conditionals Approach", fontsize=10)
plt.show()
Similarly, we can create histograms for each sampled parameter:
"""
Create histograms from posterior samples with overlaid ols estimates.
"""
hist_color = "#ff7595" #"#b894ff"
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(10, 5), tight_layout=True)
plt_ols = np.append(b_ols.ravel(), sigma2[0])
indices = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
for indx, (varname, (ii, jj)) in enumerate(zip(varnames, indices)):
ax[ii, jj].set_title(varname, color="#000000", loc="center", fontsize=8)
ax[ii, jj].hist(
Xall1[:,indx], 25, density=True, alpha=1, color=hist_color,
edgecolor="#FFFFFF", linewidth=1.0
)
label0 = r"$\hat \sigma_{0}^{2}$" if indx == 5 else r"$\hat \beta_{OLS}$"
label1 = r"$\hat \sigma_{MCMC}^{2}$" if indx == 5 else r"$\hat \beta_{MCMC}$"
ax[ii,jj].axvline(plt_ols[indx], color="grey", linewidth=1.25, linestyle="--", label=label0)
ax[ii,jj].axvline(Xall1[:,indx].mean(), color="blue", linewidth=1.25, linestyle="--", label=label1)
ax[ii, jj].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}"))
ax[ii,jj].set_yticklabels([])
ax[ii, jj].set_xlabel("")
ax[ii, jj].set_ylabel("")
ax[ii, jj].tick_params(axis="x", which="major", direction='in', labelsize=6)
ax[ii, jj].tick_params(axis="x", which="minor", direction='in', labelsize=6)
ax[ii, jj].tick_params(axis="y", which="major", direction='in', labelsize=6)
ax[ii, jj].tick_params(axis="y", which="minor", direction='in', labelsize=6)
ax[ii, jj].xaxis.set_ticks_position("none")
ax[ii, jj].yaxis.set_ticks_position("none")
ax[ii, jj].grid(True)
ax[ii, jj].set_axisbelow(True)
ax[ii,jj].legend(loc="upper right", fancybox=True, framealpha=1, fontsize="x-small")
plt.suptitle("Histograms: Full Conditionals Approach", fontsize=10)
plt.show()
The composition method can also be used to generate posterior samples for
As before, the conditional distribution for
The composition method is faster than the full conditionals method since we can draw all samples from
-
Compute
$\hat \beta$ via OLS and$V = (X^{T}X)^{-1}$ . -
Compute
$SSR = \frac{1}{n - p}(y - X\hat \beta)^{T}(y - X\hat \beta)$ . -
Draw
$m$ samples from$IG\big(\frac{n - p}{2}, \frac{(n - p) \times \mathrm{SSR}}{2}\big)$ . -
For
$i = 1, \cdots, m$ : Draw$\beta^{(i)}$ from$f(\beta|\sigma^{2(i)}, X, y)$ :$\beta^{(i)} \sim \mathcal{N}(\hat \beta, \sigma^{2(i)}V)$ .
"""
Drawing posterior samples using the composition method.
PRICE ~ "ECOST" + "FSIZE" + "SHELVES" + "FEATURES"
"""
n, p = X.shape
M = 10000
burnin = 1000
prng = np.random.RandomState(516)
# Initialize betas to hold posterior samples.
betas = np.zeros([M, p])
# Compute OLS estimate, V and SSR.
b_ols = np.linalg.inv(X.T @ X) @ X.T @ y
V = np.linalg.inv(X.T @ X)
SSR = (y - X @ b_ols).T @ (y - X @ b_ols) / (n - p)
# Draw m samples from marginal distribution for sigma2.
sigma2 = stats.invgamma.rvs(a=.50 * (n - p), scale=.50 * (n - p) * SSR, size=M, random_state=prng)
# Simulate sequence of beta draws for each value in sigma2.
for ii in range(M):
betas[ii,:] = b_ols.T + prng.randn(p).reshape(1, -1) @ cholesky(sigma2[ii] * V)
Traceplots:
"""
Combine beta and sigma2 arrays to create traceplots.
"""
varnames = ["intercept"] + variates + ["sigma2"]
Xall2 = np.hstack([betas, sigma2.reshape(-1, 1)])
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(10, 5), tight_layout=True)
indices = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
for indx, (varname, (ii, jj)) in enumerate(zip(varnames, indices)):
ax[ii, jj].set_title(varname, color="#000000", loc="center", fontsize=8)
ax[ii, jj].plot(Xall2[:,indx], color="#000000", linewidth=.25, linestyle="-")
ax[ii, jj].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}"))
ax[ii, jj].set_xlabel("")
ax[ii, jj].set_ylabel("")
ax[ii, jj].tick_params(axis="x", which="major", direction='in', labelsize=7)
ax[ii, jj].tick_params(axis="x", which="minor", direction='in', labelsize=7)
ax[ii, jj].tick_params(axis="y", which="major", direction='in', labelsize=7)
ax[ii, jj].tick_params(axis="y", which="minor", direction='in', labelsize=7)
ax[ii, jj].xaxis.set_ticks_position("none")
ax[ii, jj].yaxis.set_ticks_position("none")
ax[ii, jj].grid(True)
ax[ii, jj].set_axisbelow(True)
plt.suptitle("Traceplots: Composition Approach", fontsize=10)
plt.show()
Histograms:
We can compare posterior percentiles from the full conditionals and composition method to demonstrate equivalence:
"""
Comparing posterior percentiles for full conditionals and composition approaches.
"""
peval = [1, 5, 25, 50, 75, 95, 99]
pcols = [f"{ii:.0f}%" for ii in peval]
for indx, varname in enumerate(varnames):
# Determine percentiles generated via full-conditionals.
df1 = pd.DataFrame(
np.round(np.percentile(Xall1[:, indx], q=peval), 2).reshape(1,-1),
columns=pcols,index=[f"{varname}: conditionals"]
)
# Determine percentiles generated via composition method.
df2 = pd.DataFrame(
np.round(np.percentile(Xall2[:, indx], q=peval), 2).reshape(1,-1),
columns=pcols, index=[f"{varname}: composition"]
)
df = pd.concat([df1, df2])
display(df)
For a new set of predictors
- For
$i = 1, \cdots, n$ : Draw$(\beta^{(i)}, \sigma^{2(i)})$ from posterior samples. - Draw
$\tilde{y}^{(i)} \sim \mathcal{N}(\tilde{X}\beta^{(i)}, \sigma^{2}I)$ .
The code that follows generates 5000 posterior predictive samples for a new set of predictors.
"""
Generating posterior predictive samples for a new set of predictors.
"""
n = 5000
Xnew = [1, 82, 5.1, 2, 8]
# Determine which 5000 parameter sets to use for sampling.
indices = prng.choice(range(Xall2.shape[0]), size=n, replace=False)
params = Xall2[indices,:]
samples = []
for ii in range(n):
beta, v = params[ii, :-1], params[ii, -1]
mu = np.dot(beta, Xnew)
y_ii = prng.normal(loc=mu, scale=sqrt(v))
samples.append(y_ii)
samples = np.asarray(samples)
Viewing the histogram of posterior samples:
hist_color = "#ff7595"
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8, 5), tight_layout=True)
ax.set_title(
"Posterior predictive samples for new predictors", color="#000000", loc="center", fontsize=9
)
ax.hist(
samples, 30, density=True, alpha=1, color=hist_color,
edgecolor="#FFFFFF", linewidth=1.0
)
label0, label1 = r"$\hat{y}_{OLS}$", r"$\hat y_{MCMC}$"
ax.axvline(np.dot(b_ols.T, Xnew), color="grey", linewidth=1.25, linestyle="--", label=label0)
ax.axvline(samples.mean(), color="blue", linewidth=1.25, linestyle="--", label=label1)
ax.xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}"))
ax.set_yticklabels([])
ax.set_xlabel("")
ax.set_ylabel("")
ax.tick_params(axis="x", which="major", direction='in', labelsize=7)
ax.tick_params(axis="x", which="minor", direction='in', labelsize=7)
ax.tick_params(axis="y", which="major", direction='in', labelsize=7)
ax.tick_params(axis="y", which="minor", direction='in', labelsize=7)
ax.xaxis.set_ticks_position("none")
ax.yaxis.set_ticks_position("none")
ax.grid(True)
ax.set_axisbelow(True)
ax.legend(loc="upper right", fancybox=True, framealpha=1, fontsize="small")
plt.show()
Finally, a summary of posterior predictive percentiles: