akelleh · April 12, 2017 18:12
diff --git a/AB test CATEs.ipynb b/AB test CATEs.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, we'll look at a situation where there's some variable we haven't measured that helps determine whether or not a user visits website X. The same variable also helps determine their response to the AB test: the \"treatment effect\". For example, if we're measuring whether a user re-visits the site after exposure to the test as the KPI, then the \"average treatment effect\" is just the increase in the average visit rate (averaging over all users).\n",
    "\n",
    "We're interested in how much bias is introduced by self-selection, so we'll use really big sample sizes. We're not worried about random error for this example (but you should be when you're experimenting!). Let's continue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use a big sample size so we don't have to worry about error bars..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "N = 10000000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$u$ will be the unmeasured variable that causes a person to enter into the test. $u$ will just be the propensity to visit. \n",
    "\n",
    "We'll use $u$ to get $vi$, which is a binary variable for whether or not the user visits at time $t_i$, and then calculate the treatment assignment $a$. The assignment variable will have three states, even though this is an AB test! This is one of the key differences. We're assigning to all site users, $U$, and not just the ones entering the test, $T$. The assignments are 2 (test group), 1 (control group), and 0 (not assigned). The user is assigned whenever they visit the site.\n",
    "\n",
    "Notice $a=0$ whenever $v_i=0$: if a user doesn't visit the site ($v_i=0$) then they can't be assigned to test or control (so $a=0$).  $a$ is 1 (control) or 2 (test) otherwise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "u = 1. / (1. + np.exp(-np.random.normal(size=N)))\n",
    "vi = np.random.binomial(1, u)\n",
    "a = vi*(1+np.random.binomial(1, 0.5, size=N))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we define the \"potential outcomes\", $v(0)$, $v(1)$, and $v(2)$, and use them to calculate $v$ at time $t_{i+1}$, denoted vi1. The potential outcomes $v(a)$ are defined on each unit (you might write them as $v_i$ for the $i$th unit), and indicate the value that $v$ would take on if the unit has assignment $a$.\n",
    "\n",
    "The latent variable $u$ has average 0.5, so the expected differences in the potential outcomes (in order) are 0.05 and 0.05. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "v0 = np.random.binomial(1, u*0.1)\n",
    "v1 = np.random.binomial(1, u*0.2)\n",
    "v2 = np.random.binomial(1, u*0.3)\n",
    "vi1 = (a==0)*v0 + (a==1)*v1 + (a==2)*v2\n",
    "X = pd.DataFrame({'vi': vi, 'ai': a, 'v_i+1': vi1})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we take the expected difference of $V(2)$ and $V(1)$, we'd have the average treatment effect from the experiment! Of course you can never do that in a real data set: you can only ever measure $V(a)$ for a single value of $a$ on each unit. This is the \"fundamental problem of causal inference\". Since $a$ and $V(a)$ are all determined by $u$, we'll generally get biased estimates of the true causal effect. i.e. $E[V(2)] - E[V(1)] \\neq E[V|A=2] - E[V|A=1]$. This is because we're not implementing complete randomized control on $a$! Just the two values of $A$, $a=1$, and $a=2$. It makes it so we don't have a good experiment, and we're effectively working with observational data for the treatment assignment. We have to apply the back-door criterion if we want unbiased effect estimates!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So what $\\mathit{are}$ we measuring? It turns out to be a weird conditional treatment effect:\n",
    "$E[V_{i+1}| A_i=2, V_i =1] - E[V_{i+1}| A_i=1, V_i =1]$, since $V_{i+1} \\not\\perp V_{i}$. That is, it's the treatment effect on people who visited the site, and so were assigned into the experiment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/adamkelleher/.virtualenv/causality/lib/python2.7/site-packages/ipykernel/__main__.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.\n",
      "  if __name__ == '__main__':\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.058913793692493874"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X[X['vi'] == 1][X['ai']==2].mean()['v_i+1'] - X[X['vi'] == 1][X['ai'] == 1].mean()['v_i+1']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's because on $a_i = 1$ or $a_i=2$ implies $v_i = 1$, so $\\mathrm{X[X[`vi'] == 1][X[`ai']==2]}$ is the same data as $\\mathrm{X[X[`ai']==2]}$! Here's the naive estimate for comparison:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.058913793692493874"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X[X['ai']==2].mean()['v_i+1'] - X[X['ai'] == 1].mean()['v_i+1']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the same data-generating process, but this time control assignment. We'll intervene to set $a$ to 0, 1 or 2 (you could exclude 0 and get the same result). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "N = 10000000\n",
    "\n",
    "u = 1. / (1. + np.exp(-np.random.normal(size=N)))\n",
    "vi = np.random.binomial(1, u)\n",
    "a = np.random.choice([0,1,2], size=N )  # random, instead of vi*(1+np.random.binomial(1, 0.1, size=N))\n",
    "\n",
    "c = np.random.binomial(1, 0.1*a)\n",
    "\n",
    "v0 = np.random.binomial(1, u*0.1)\n",
    "v1 = np.random.binomial(1, u*0.2)\n",
    "v2 = np.random.binomial(1, u*0.3)\n",
    "vi1 = (a==0)*v0 + (a==1)*v1 + (a==2)*v2\n",
    "X = pd.DataFrame({'vi': vi, 'ai': a, 'v_i+1': vi1})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The conditional effect is the same as before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/adamkelleher/.virtualenv/causality/lib/python2.7/site-packages/ipykernel/__main__.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.\n",
      "  if __name__ == '__main__':\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.058556350973660784"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X[X['vi'] == 1][X['ai']==2].mean()['v_i+1'] - X[X['vi'] == 1][X['ai'] == 1].mean()['v_i+1']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But now the naive estimate is an unbiased estimate for the ATE, 0.05."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.050095413800637895"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X[X['ai']==2].mean()['v_i+1'] - X[X['ai'] == 1].mean()['v_i+1']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In this example, we'll look at a situation where there's some variable we haven't measured that helps determine whether or not a user visits website X. The same variable also helps determine their response to the AB test: the \"treatment effect\". For example, if we're measuring whether a user re-visits the site after exposure to the test as the KPI, then the \"average treatment effect\" is just the increase in the average visit rate (averaging over all users).\n",
	"\n",
	"We're interested in how much bias is introduced by self-selection, so we'll use really big sample sizes. We're not worried about random error for this example (but you should be when you're experimenting!). Let's continue."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import pandas as pd\n",
	"import numpy as np"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Use a big sample size so we don't have to worry about error bars..."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"N = 10000000"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$u$ will be the unmeasured variable that causes a person to enter into the test. $u$ will just be the propensity to visit. \n",
	"\n",
	"We'll use $u$ to get $vi$, which is a binary variable for whether or not the user visits at time $t_i$, and then calculate the treatment assignment $a$. The assignment variable will have three states, even though this is an AB test! This is one of the key differences. We're assigning to all site users, $U$, and not just the ones entering the test, $T$. The assignments are 2 (test group), 1 (control group), and 0 (not assigned). The user is assigned whenever they visit the site.\n",
	"\n",
	"Notice $a=0$ whenever $v_i=0$: if a user doesn't visit the site ($v_i=0$) then they can't be assigned to test or control (so $a=0$). $a$ is 1 (control) or 2 (test) otherwise."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"u = 1. / (1. + np.exp(-np.random.normal(size=N)))\n",
	"vi = np.random.binomial(1, u)\n",
	"a = vi*(1+np.random.binomial(1, 0.5, size=N))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Then we define the \"potential outcomes\", $v(0)$, $v(1)$, and $v(2)$, and use them to calculate $v$ at time $t_{i+1}$, denoted vi1. The potential outcomes $v(a)$ are defined on each unit (you might write them as $v_i$ for the $i$th unit), and indicate the value that $v$ would take on if the unit has assignment $a$.\n",
	"\n",
	"The latent variable $u$ has average 0.5, so the expected differences in the potential outcomes (in order) are 0.05 and 0.05. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"v0 = np.random.binomial(1, u*0.1)\n",
	"v1 = np.random.binomial(1, u*0.2)\n",
	"v2 = np.random.binomial(1, u*0.3)\n",
	"vi1 = (a==0)v0 + (a==1)v1 + (a==2)*v2\n",
	"X = pd.DataFrame({'vi': vi, 'ai': a, 'v_i+1': vi1})"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If we take the expected difference of $V(2)$ and $V(1)$, we'd have the average treatment effect from the experiment! Of course you can never do that in a real data set: you can only ever measure $V(a)$ for a single value of $a$ on each unit. This is the \"fundamental problem of causal inference\". Since $a$ and $V(a)$ are all determined by $u$, we'll generally get biased estimates of the true causal effect. i.e. $E[V(2)] - E[V(1)] \\neq E[V\|A=2] - E[V\|A=1]$. This is because we're not implementing complete randomized control on $a$! Just the two values of $A$, $a=1$, and $a=2$. It makes it so we don't have a good experiment, and we're effectively working with observational data for the treatment assignment. We have to apply the back-door criterion if we want unbiased effect estimates!"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"So what $\\mathit{are}$ we measuring? It turns out to be a weird conditional treatment effect:\n",
	"$E[V_{i+1}\| A_i=2, V_i =1] - E[V_{i+1}\| A_i=1, V_i =1]$, since $V_{i+1} \\not\\perp V_{i}$. That is, it's the treatment effect on people who visited the site, and so were assigned into the experiment:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 18,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"/Users/adamkelleher/.virtualenv/causality/lib/python2.7/site-packages/ipykernel/__main__.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.\n",
	" if __name__ == '__main__':\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"0.058913793692493874"
	]
	},
	"execution_count": 18,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"X[X['vi'] == 1][X['ai']==2].mean()['v_i+1'] - X[X['vi'] == 1][X['ai'] == 1].mean()['v_i+1']"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"It's because on $a_i = 1$ or $a_i=2$ implies $v_i = 1$, so $\\mathrm{X[X[`vi'] == 1][X[`ai']==2]}$ is the same data as $\\mathrm{X[X[`ai']==2]}$! Here's the naive estimate for comparison:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"0.058913793692493874"
	]
	},
	"execution_count": 19,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"X[X['ai']==2].mean()['v_i+1'] - X[X['ai'] == 1].mean()['v_i+1']"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can use the same data-generating process, but this time control assignment. We'll intervene to set $a$ to 0, 1 or 2 (you could exclude 0 and get the same result). "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"N = 10000000\n",
	"\n",
	"u = 1. / (1. + np.exp(-np.random.normal(size=N)))\n",
	"vi = np.random.binomial(1, u)\n",
	"a = np.random.choice([0,1,2], size=N ) # random, instead of vi*(1+np.random.binomial(1, 0.1, size=N))\n",
	"\n",
	"c = np.random.binomial(1, 0.1*a)\n",
	"\n",
	"v0 = np.random.binomial(1, u*0.1)\n",
	"v1 = np.random.binomial(1, u*0.2)\n",
	"v2 = np.random.binomial(1, u*0.3)\n",
	"vi1 = (a==0)v0 + (a==1)v1 + (a==2)*v2\n",
	"X = pd.DataFrame({'vi': vi, 'ai': a, 'v_i+1': vi1})"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The conditional effect is the same as before:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 21,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"/Users/adamkelleher/.virtualenv/causality/lib/python2.7/site-packages/ipykernel/__main__.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.\n",
	" if __name__ == '__main__':\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"0.058556350973660784"
	]
	},
	"execution_count": 21,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"X[X['vi'] == 1][X['ai']==2].mean()['v_i+1'] - X[X['vi'] == 1][X['ai'] == 1].mean()['v_i+1']"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"But now the naive estimate is an unbiased estimate for the ATE, 0.05."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 22,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"0.050095413800637895"
	]
	},
	"execution_count": 22,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"X[X['ai']==2].mean()['v_i+1'] - X[X['ai'] == 1].mean()['v_i+1']"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 2",
	"language": "python",
	"name": "python2"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}