haozhu233 · October 28, 2019 22:48
diff --git a/Understanding Objective Functions in Disentangled VAE.ipynb b/Understanding Objective Functions in Disentangled VAE.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Understanding Objective Functions in probtorch\n",
    "In this tutorial, we are going to study the actual implementation of the Disentangled VAE paper. The implementation below are done by the original authors posted in their [probtorch](https://github.com/probtorch/probtorch). \n",
    "\n",
    "In probtorch, the authors implemented 3 types of objective functions:\n",
    "- [Monte Carlo](https://github.com/probtorch/probtorch/blob/master/probtorch/objectives/montecarlo.py)\n",
    "- [Importance weighted Monte Carlo](https://github.com/probtorch/probtorch/blob/master/probtorch/objectives/importance.py)\n",
    "- [Marginal](https://github.com/probtorch/probtorch/blob/master/probtorch/objectives/marginal.py)\n",
    "\n",
    "Only the first one (montecarlo.py) implements the algorithm mainly discussed in the paper. They mentioned the importance weighted MC in equation (9). The marginal implementation was done for [a different paper](https://arxiv.org/abs/1804.02086) coming out in 2018. \n",
    "\n",
    "# ELBO Function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "$$ELBO = E_{q(z | x, y)} \\left[ \\log p(x | y, z) \\right]\n",
    "       - \\beta E_{q(z | x, y)} \\left[ \\log \\frac{q(y,z | x)}{p(y,z)} \\right]\n",
    "       + (\\beta + \\alpha) E_{q(z | x)}\\left[ \\log \\frac{q(y, z| x)}{q(z | x)} \\right]$$\n",
    "       \n",
    "There is a new $\\beta$ term defined in the code. This idea comes from the [$\\beta-VAE$ paper](https://arxiv.org/pdf/1804.03599.pdf) as a way to encourage effective disentanglement and suppress the effect of reconstruction loss (first term). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numbers import Number\n",
    "from torch.nn.functional import softmax"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def elbo(q, p, sample_dim=None, batch_dim=None, alpha=0.1, beta=1.0,\n",
    "         size_average=True, reduce=True):\n",
    "    r\"\"\"Calculates an importance sampling estimate of the semi-supervised\n",
    "    evidence lower bound (ELBO), as described in [1]\n",
    "    References:\n",
    "        [1] N. Siddharth, Brooks Paige, Jan-Willem van de Meent,\n",
    "        Alban Desmaison, Frank Wood, Noah D. Goodman, Pushmeet Kohli, and\n",
    "        Philip HS Torr, Semi-Supervised Learning of Disentangled\n",
    "        Representations, NIPS 2017.\n",
    "    \"\"\"\n",
    "    log_weights = q.log_joint(sample_dim, batch_dim, q.conditioned())\n",
    "    return (log_like(q, p, sample_dim, batch_dim, log_weights,\n",
    "                     size_average=size_average, reduce=reduce) -\n",
    "            beta * kl(q, p, sample_dim, batch_dim, log_weights,\n",
    "                      size_average=size_average, reduce=reduce) +\n",
    "            (beta + alpha) * ml(q, sample_dim, batch_dim, log_weights,\n",
    "                                size_average=size_average, reduce=reduce))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In Section 2.2 of the text, the author wrote out the details of the stochastic computation graph. In the actual implementation, in the documentation, the author wrote:\n",
    "- `x`: The set of conditioned nodes that are present in `p` but are not present in `q`.\n",
    "- `y`: The set of conditioned nodes in `q`, which may or maynot also be present in `q`.\n",
    "- `z`: The set of sampled nodes present in both `q` and `p`.\n",
    "\n",
    "Based on this definitions, we can see the authors made the following definitions in the functions below. \n",
    "```\n",
    "x = [n for n in p.conditioned() if n not in q]\n",
    "y = q.conditioned()\n",
    "z = [n for n in q.sampled() if n in p]\n",
    "```\n",
    "\n",
    "## Log likelihood\n",
    "\n",
    "$$E_{q(z | x, y)}[\\log p(x | y, z)]\n",
    "       \\simeq \\frac{1}{S} \\frac{1}{B} \\sum_{s=1}^S \\sum_{b=1}^B\n",
    "              \\log p(x^{(b)} | z^{(s,b)}, y^{(b)})$$\n",
    "              "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def log_like(q, p, sample_dim=None, batch_dim=None, log_weights=None,\n",
    "             size_average=True, reduce=True):\n",
    "    r\"\"\"Computes a Monte Carlo estimate of the log-likelihood.\n",
    "    \"\"\"\n",
    "    # Core ===============================================\n",
    "    x = [n for n in p.conditioned() if n not in q]\n",
    "    objective = p.log_joint(sample_dim, batch_dim, x)\n",
    "    # ====================================================\n",
    "    if sample_dim is not None:\n",
    "        if log_weights is None:\n",
    "            log_weights = q.log_joint(sample_dim, batch_dim, q.conditioned())\n",
    "        if isinstance(log_weights, Number):\n",
    "            objective = objective.mean(0)\n",
    "        else:\n",
    "            weights = softmax(log_weights, 0)\n",
    "            objective = (weights * objective).sum(0)\n",
    "    if reduce:\n",
    "        objective = objective.mean() if size_average else objective.sum()\n",
    "    return objective"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## KL function\n",
    "$$E_{q(z | x, y)}\\left[ \\log \\frac{q(y, z | x)}{p(y, z)} \\right]\n",
    "       \\simeq\n",
    "       \\frac{1}{S} \\frac{1}{B} \\sum_{s=1}^S \\sum_{b=1}^B\n",
    "       \\left[ \\log \\frac{q(y^{(b)}, z^{(s,b)} | x^{(b)})}\n",
    "                        {p(y^{(b)}, z^{(s,b)})} \\right]$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def kl(q, p, sample_dim=None, batch_dim=None, log_weights=None,\n",
    "       size_average=True, reduce=True):\n",
    "    r\"\"\"Computes a Monte Carlo estimate of the unnormalized KL divergence\n",
    "    described in [1].\n",
    "    \"\"\"\n",
    "    # Core ==================================================\n",
    "    y = q.conditioned()\n",
    "    if log_weights is None:\n",
    "        log_weights = q.log_joint(sample_dim, batch_dim, y)\n",
    "    log_qy = log_weights\n",
    "    log_py = p.log_joint(sample_dim, batch_dim, y)\n",
    "    z = [n for n in q.sampled() if n in p]\n",
    "    log_pz = p.log_joint(sample_dim, batch_dim, z)\n",
    "    log_qz = q.log_joint(sample_dim, batch_dim, z)\n",
    "    objective = (log_qy + log_qz - log_py - log_pz)\n",
    "    # =======================================================\n",
    "    if sample_dim is not None:\n",
    "        if isinstance(log_weights, Number):\n",
    "            objective = objective.mean(0)\n",
    "        else:\n",
    "            weights = softmax(log_weights, 0)\n",
    "            objective = (weights * objective).sum(0)\n",
    "    if reduce:\n",
    "        objective = objective.mean() if size_average else objective.sum()\n",
    "    return objective"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Maximum Likelihood\n",
    "$$E_{q(z | x)}\\left[ \\log \\frac{q(y, z| x)}{q(z | x)} \\right]\n",
    "       \\simeq \\frac{1}{S} \\frac{1}{B} \\sum_{s=1}^S \\sum_{b=1}^B\n",
    "       \\left[ \\log \\frac{q( y^{(b)}, z^{(s,b)} | x^{(b)})}\n",
    "                        {q(z^{(s,b)} | x^{(b)})} \\right]$$\n",
    "                        \n",
    "This is the same as Equation 7. In fact, here objective = log_weights = log_qy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def ml(q, sample_dim=None, batch_dim=None, log_weights=None,\n",
    "       size_average=True, reduce=True):\n",
    "    r\"\"\"Computes a Monte Carlo estimate of maximum likelihood encoder objective\n",
    "    \"\"\"\n",
    "    if log_weights is None:\n",
    "        log_weights = q.log_joint(sample_dim, batch_dim, q.conditioned())\n",
    "    # Core ==============================================\n",
    "    objective = log_weights\n",
    "    # ===================================================\n",
    "    if not isinstance(objective, Number):\n",
    "        if sample_dim is not None:\n",
    "            objective = objective.mean(0)\n",
    "        if reduce:\n",
    "            objective = objective.mean() if size_average else objective.sum()\n",
    "    return objective"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Understanding Objective Functions in probtorch\n",
	"In this tutorial, we are going to study the actual implementation of the Disentangled VAE paper. The implementation below are done by the original authors posted in their [probtorch](https://github.com/probtorch/probtorch). \n",
	"\n",
	"In probtorch, the authors implemented 3 types of objective functions:\n",
	"- [Monte Carlo](https://github.com/probtorch/probtorch/blob/master/probtorch/objectives/montecarlo.py)\n",
	"- [Importance weighted Monte Carlo](https://github.com/probtorch/probtorch/blob/master/probtorch/objectives/importance.py)\n",
	"- [Marginal](https://github.com/probtorch/probtorch/blob/master/probtorch/objectives/marginal.py)\n",
	"\n",
	"Only the first one (montecarlo.py) implements the algorithm mainly discussed in the paper. They mentioned the importance weighted MC in equation (9). The marginal implementation was done for [a different paper](https://arxiv.org/abs/1804.02086) coming out in 2018. \n",
	"\n",
	"# ELBO Function"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"\n",
	"$$ELBO = E_{q(z \| x, y)} \\left[ \\log p(x \| y, z) \\right]\n",
	" - \\beta E_{q(z \| x, y)} \\left[ \\log \\frac{q(y,z \| x)}{p(y,z)} \\right]\n",
	" + (\\beta + \\alpha) E_{q(z \| x)}\\left[ \\log \\frac{q(y, z\| x)}{q(z \| x)} \\right]$$\n",
	" \n",
	"There is a new $\\beta$ term defined in the code. This idea comes from the [$\\beta-VAE$ paper](https://arxiv.org/pdf/1804.03599.pdf) as a way to encourage effective disentanglement and suppress the effect of reconstruction loss (first term). "
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from numbers import Number\n",
	"from torch.nn.functional import softmax"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def elbo(q, p, sample_dim=None, batch_dim=None, alpha=0.1, beta=1.0,\n",
	" size_average=True, reduce=True):\n",
	" r\"\"\"Calculates an importance sampling estimate of the semi-supervised\n",
	" evidence lower bound (ELBO), as described in [1]\n",
	" References:\n",
	" [1] N. Siddharth, Brooks Paige, Jan-Willem van de Meent,\n",
	" Alban Desmaison, Frank Wood, Noah D. Goodman, Pushmeet Kohli, and\n",
	" Philip HS Torr, Semi-Supervised Learning of Disentangled\n",
	" Representations, NIPS 2017.\n",
	" \"\"\"\n",
	" log_weights = q.log_joint(sample_dim, batch_dim, q.conditioned())\n",
	" return (log_like(q, p, sample_dim, batch_dim, log_weights,\n",
	" size_average=size_average, reduce=reduce) -\n",
	" beta * kl(q, p, sample_dim, batch_dim, log_weights,\n",
	" size_average=size_average, reduce=reduce) +\n",
	" (beta + alpha) * ml(q, sample_dim, batch_dim, log_weights,\n",
	" size_average=size_average, reduce=reduce))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In Section 2.2 of the text, the author wrote out the details of the stochastic computation graph. In the actual implementation, in the documentation, the author wrote:\n",
	"- `x`: The set of conditioned nodes that are present in `p` but are not present in `q`.\n",
	"- `y`: The set of conditioned nodes in `q`, which may or maynot also be present in `q`.\n",
	"- `z`: The set of sampled nodes present in both `q` and `p`.\n",
	"\n",
	"Based on this definitions, we can see the authors made the following definitions in the functions below. \n",
	"```\n",
	"x = [n for n in p.conditioned() if n not in q]\n",
	"y = q.conditioned()\n",
	"z = [n for n in q.sampled() if n in p]\n",
	"```\n",
	"\n",
	"## Log likelihood\n",
	"\n",
	"$$E_{q(z \| x, y)}[\\log p(x \| y, z)]\n",
	" \\simeq \\frac{1}{S} \\frac{1}{B} \\sum_{s=1}^S \\sum_{b=1}^B\n",
	" \\log p(x^{(b)} \| z^{(s,b)}, y^{(b)})$$\n",
	" "
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def log_like(q, p, sample_dim=None, batch_dim=None, log_weights=None,\n",
	" size_average=True, reduce=True):\n",
	" r\"\"\"Computes a Monte Carlo estimate of the log-likelihood.\n",
	" \"\"\"\n",
	" # Core ===============================================\n",
	" x = [n for n in p.conditioned() if n not in q]\n",
	" objective = p.log_joint(sample_dim, batch_dim, x)\n",
	" # ====================================================\n",
	" if sample_dim is not None:\n",
	" if log_weights is None:\n",
	" log_weights = q.log_joint(sample_dim, batch_dim, q.conditioned())\n",
	" if isinstance(log_weights, Number):\n",
	" objective = objective.mean(0)\n",
	" else:\n",
	" weights = softmax(log_weights, 0)\n",
	" objective = (weights * objective).sum(0)\n",
	" if reduce:\n",
	" objective = objective.mean() if size_average else objective.sum()\n",
	" return objective"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## KL function\n",
	"$$E_{q(z \| x, y)}\\left[ \\log \\frac{q(y, z \| x)}{p(y, z)} \\right]\n",
	" \\simeq\n",
	" \\frac{1}{S} \\frac{1}{B} \\sum_{s=1}^S \\sum_{b=1}^B\n",
	" \\left[ \\log \\frac{q(y^{(b)}, z^{(s,b)} \| x^{(b)})}\n",
	" {p(y^{(b)}, z^{(s,b)})} \\right]$$"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def kl(q, p, sample_dim=None, batch_dim=None, log_weights=None,\n",
	" size_average=True, reduce=True):\n",
	" r\"\"\"Computes a Monte Carlo estimate of the unnormalized KL divergence\n",
	" described in [1].\n",
	" \"\"\"\n",
	" # Core ==================================================\n",
	" y = q.conditioned()\n",
	" if log_weights is None:\n",
	" log_weights = q.log_joint(sample_dim, batch_dim, y)\n",
	" log_qy = log_weights\n",
	" log_py = p.log_joint(sample_dim, batch_dim, y)\n",
	" z = [n for n in q.sampled() if n in p]\n",
	" log_pz = p.log_joint(sample_dim, batch_dim, z)\n",
	" log_qz = q.log_joint(sample_dim, batch_dim, z)\n",
	" objective = (log_qy + log_qz - log_py - log_pz)\n",
	" # =======================================================\n",
	" if sample_dim is not None:\n",
	" if isinstance(log_weights, Number):\n",
	" objective = objective.mean(0)\n",
	" else:\n",
	" weights = softmax(log_weights, 0)\n",
	" objective = (weights * objective).sum(0)\n",
	" if reduce:\n",
	" objective = objective.mean() if size_average else objective.sum()\n",
	" return objective"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Maximum Likelihood\n",
	"$$E_{q(z \| x)}\\left[ \\log \\frac{q(y, z\| x)}{q(z \| x)} \\right]\n",
	" \\simeq \\frac{1}{S} \\frac{1}{B} \\sum_{s=1}^S \\sum_{b=1}^B\n",
	" \\left[ \\log \\frac{q( y^{(b)}, z^{(s,b)} \| x^{(b)})}\n",
	" {q(z^{(s,b)} \| x^{(b)})} \\right]$$\n",
	" \n",
	"This is the same as Equation 7. In fact, here objective = log_weights = log_qy"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def ml(q, sample_dim=None, batch_dim=None, log_weights=None,\n",
	" size_average=True, reduce=True):\n",
	" r\"\"\"Computes a Monte Carlo estimate of maximum likelihood encoder objective\n",
	" \"\"\"\n",
	" if log_weights is None:\n",
	" log_weights = q.log_joint(sample_dim, batch_dim, q.conditioned())\n",
	" # Core ==============================================\n",
	" objective = log_weights\n",
	" # ===================================================\n",
	" if not isinstance(objective, Number):\n",
	" if sample_dim is not None:\n",
	" objective = objective.mean(0)\n",
	" if reduce:\n",
	" objective = objective.mean() if size_average else objective.sum()\n",
	" return objective"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.4"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}