jnturton · September 6, 2016 06:22
diff --git a/lift-and-uplift.ipynb b/lift-and-uplift.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_Note: GitHub does not render LaTeX embedded in Jupyter Notebooks at the time of writing.  Please paste the Github URL of this document into http://nbviewer.jupyter.org/ if your view is missing mathematics._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A basic inequality relating lift and uplift\n",
    "\n",
    "### Summary\n",
    "I derive the general inequality \n",
    "\n",
    "$$\n",
    "\\frac{|S|}{|C|}l(S) \\leq \\frac{u(S)}{u(C)} \\leq \\frac{|C|}{|S|}.\n",
    "$$\n",
    "\n",
    "which constrains the possible uplift in a target set in terms of \n",
    "- the lift in the target set, \n",
    "- the size of the target set and\n",
    "- the overall uplift statistic in the population from which the target set is drawn.\n",
    "\n",
    "### Introduction\n",
    "The [lift](https://en.wikipedia.org/wiki/Lift_%28data_mining%29) statistic for measuring the performance of a classifying predictive model relative to random guessing is well known.  A little less well known is that in the particular case when a predictive model is being applied to maximise a response _to some applied influence_, the more important measurement of model performance is its [uplift](https://en.wikipedia.org/wiki/Uplift_modelling), though the traditional lift is still defined and also measureable.  It turns out that while these two statistics can be very different from each other, they are not completely free to take on independent values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Definitions\n",
    "\n",
    "<html>\n",
    "<table>\n",
    "<thead>\n",
    "    <tr>\n",
    "        <td>\n",
    "        \n",
    "        </td>\n",
    "        <td>\n",
    "            Uninfluenced positive\n",
    "        </td>\n",
    "        <td>\n",
    "            Uninfluenced negative\n",
    "        </td>        \n",
    "    </tr>\n",
    "</thead>\n",
    "<tbody>\n",
    "<tr>\n",
    "    <td>\n",
    "        Influenced positive\n",
    "    </td>\n",
    "    <td style=\"text-align: center\">\n",
    "        $a_{11}$\n",
    "    </td>\n",
    "    <td style=\"text-align: center\">\n",
    "        $a_{12}$\n",
    "    </td>\n",
    "</tr>\n",
    "<tr>\n",
    "    <td>\n",
    "        Influenced negative\n",
    "    </td>\n",
    "    <td style=\"text-align: center\">\n",
    "        $a_{21}$\n",
    "    </td>\n",
    "    <td style=\"text-align: center\">\n",
    "        $a_{22}$\n",
    "    </td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The contingency table above counts the members of a subset $S \\subseteq C$ of customers (patients, etc.) of interest (our campaign group, say) in the buckets formed by two binary variables:\n",
    "- one on the rows describing the response of customer when subjected to influence and\n",
    "- one on the columns describing the response of a customer when _not_ subjected to influence.\n",
    "\n",
    "The usual labels, \"sure thing\", \"persuadable\", \"lost cause\" and \"do not disturb\", may be attached to the cells of the table."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Definition:** The *influenced response rate* in $S$ is \n",
    "\n",
    "$$\n",
    "r^i(S) = \\frac{a_{11} + a_{12}}{\\sum_{i,j}a_{ij}}.\n",
    "$$\n",
    "\n",
    "**Definition:** The *uninfluenced*, or *control*, *response rate in $S$* is \n",
    "\n",
    "$$\n",
    "r^u(S) = \\frac{a_{11} + a_{21}}{\\sum_{i,j}a_{ij}}.\n",
    "$$\n",
    "\n",
    "**Definition:**  With $r^i(C)$ being the influenced response rate across all customers in $C$, the *lift* for applying influence in $S$ is \n",
    "\n",
    "$$\n",
    "l(S) = \\frac{r^i(S)}{r^i(C)}.\n",
    "$$\n",
    "\n",
    "**Remark:** It's not hard to show that\n",
    "$$\n",
    "0 \\leq l(S) \\leq \\frac{C}{S}.\n",
    "$$\n",
    "\n",
    "**Example:** If $l(S) = 1.5$ then the application of influence in $S$ results in a response rate $1.5\\times$ greater than the response rate of the application of influence to all customers in $C$.\n",
    "\n",
    "**Definition:** What I will call the *gain* for applying influence in $S$ is\n",
    "\n",
    "$$\n",
    "g(S) = r^i(S)-r^u(S) = \\frac{a_{12}-a_{21}}{\\sum_{i,j}a_{ij}}.\n",
    "$$\n",
    "\n",
    "**Example:** If the response rate in $S$ with no influence applied is $10\\%=0.1$ and the response rate in $S$ with influence applied is $20\\%=0.2$ then the gain is $g(S) = 0.2 - 0.1 = 0.1 = 10\\%$.\n",
    "\n",
    "**Remark:** The definition of _gain_ above is likely what you would have expected that I would provide for the _uplift_ but I have deliberately defined uplift as the ratio to follow so that it is formulated in a consistent way with the given definition of the lift.  This ratio-based definition turns out to be relevant in the ensuing mathematics relating lift and uplift while the _gain_ as it appears above will not be used again here.\n",
    "\n",
    "**Definition:** The *uplift* for applying influence in $S$ is\n",
    "\n",
    "$$\n",
    "u(S) = \\frac{r^i(S)}{r^u(S)} = \\frac{a_{11} + a_{12}}{a_{11} + a_{21}}\n",
    "$$\n",
    "\n",
    "**Example:** If $u(S) = 1.5$ then the application of influence in $S$ results in a response rate $1.5\\times$ greater than the response rate when no influence is applied in $S$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inequalities relating lift and uplift in $S$ and $C$ \n",
    "\n",
    "Definitions done at last, let's consider some of the implications that the _size_ of $S$ has for $r^u(S)$ and $r^u(C)$.  Noting that for any $T \\subseteq C$, $r^u(T) |T|$ is just the count of the number of uninfluenced positive responses in $T$ we have the following:\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "r^u(S) |S| &\\leq r^u(C) |C|\\\\\n",
    "r^u(S) &\\leq \\frac{|C|}{|S|} r^u(C).\\\\\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "Similarly\n",
    "$$\n",
    "r^i(S) \\leq \\frac{|C|}{|S|} r^i(C).\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It follows that \n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "u(S) =  \\frac{r^i(S)}{r^u(S)} &\\geq \\frac{r^i(S)}{\\frac{|C|}{|S|} r^u(C)}\\\\\n",
    "&\\geq \\frac{|S|}{|C|}\\frac{r^i(S)}{r^i(C)}\\frac{r^i(C)}{r^u(C)}\\\\\n",
    "&\\geq \\frac{|S|}{|C|} l(S)u(C)\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "where $l(S)$ is the lift in $S$ as defined higher up.  Working this time from the inequality on $r^i(S)$ we find\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "u(S) &\\leq \\frac{|C|}{|S|} \\frac{r^i(C)}{r^u(C)}\\\\\n",
    "&\\leq \\frac{|C|}{|S|} u(C).\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "Assembling the two inequalities on $u(S)$ and dividing through by $u(C)$, we have the two-sided constraint on the ratio of the uplift in $S$ to the uplift in $C$.\n",
    "\n",
    "**Theorem:**\n",
    "$$\n",
    "\\frac{|S|}{|C|}l(S) \\leq \\frac{u(S)}{u(C)} \\leq \\frac{|C|}{|S|}.\n",
    "$$\n",
    "\n",
    "We can think of $u(C)$ as the overall, or average, effectiveness of the application of influence.  It's how effective the influence is without any targeting at all."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example:** If we are considering targeting a group $S$ that is $20\\%$ of $C$ and our data scientists estimate that $l(S)\\approx 4$ but do not have the data to estimate the number we really want to know, $u(S)$, then we can still use the inequalities to estimate that\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\frac{1}{5}\\times 4 \\lesssim &\\frac{u(S)}{u(C)} \\leq \\frac{5}{1} \\\\\n",
    "0.8 \\lesssim &\\frac{u(S)}{u(C)} \\leq 5.\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "So in this example it is guaranteed that the uplift in $S$ is greater than or equal to $0.8\\times u(C)$, $80\\%$ of the overall uplift, even if our model to target $S$ is the worst possible w.r.t. maximising uplift.  It is also guaranteed that not even a crystal ball can target a set $S$ of customers such that the uplift in $S$ will be more than $5\\times u(C)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Remark:** In practical terms the constraint imposed on by these inequalities is not very strong and in most cases it will not be particularly informative.  But they are completely general and apply irrespective of predicitive model employed.\n",
    "\n",
    "**Example:** In the extreme case where $S$ contains all of the influenced positive response members of $C$ then\n",
    "\n",
    "$$\n",
    "l(S) = \\frac{C}{S}\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "1 \\leq \\frac{u(S)}{u(C)}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "© James Turton 2016"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2+"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"_Note: GitHub does not render LaTeX embedded in Jupyter Notebooks at the time of writing. Please paste the Github URL of this document into http://nbviewer.jupyter.org/ if your view is missing mathematics._"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## A basic inequality relating lift and uplift\n",
	"\n",
	"### Summary\n",
	"I derive the general inequality \n",
	"\n",
	"$$\n",
	"\\frac{\|S\|}{\|C\|}l(S) \\leq \\frac{u(S)}{u(C)} \\leq \\frac{\|C\|}{\|S\|}.\n",
	"$$\n",
	"\n",
	"which constrains the possible uplift in a target set in terms of \n",
	"- the lift in the target set, \n",
	"- the size of the target set and\n",
	"- the overall uplift statistic in the population from which the target set is drawn.\n",
	"\n",
	"### Introduction\n",
	"The [lift](https://en.wikipedia.org/wiki/Lift_%28data_mining%29) statistic for measuring the performance of a classifying predictive model relative to random guessing is well known. A little less well known is that in the particular case when a predictive model is being applied to maximise a response _to some applied influence_, the more important measurement of model performance is its [uplift](https://en.wikipedia.org/wiki/Uplift_modelling), though the traditional lift is still defined and also measureable. It turns out that while these two statistics can be very different from each other, they are not completely free to take on independent values."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Definitions\n",
	"\n",
	"<html>\n",
	"<table>\n",
	"<thead>\n",
	" <tr>\n",
	" <td>\n",
	" \n",
	" </td>\n",
	" <td>\n",
	" Uninfluenced positive\n",
	" </td>\n",
	" <td>\n",
	" Uninfluenced negative\n",
	" </td> \n",
	" </tr>\n",
	"</thead>\n",
	"<tbody>\n",
	"<tr>\n",
	" <td>\n",
	" Influenced positive\n",
	" </td>\n",
	" <td style=\"text-align: center\">\n",
	" $a_{11}$\n",
	" </td>\n",
	" <td style=\"text-align: center\">\n",
	" $a_{12}$\n",
	" </td>\n",
	"</tr>\n",
	"<tr>\n",
	" <td>\n",
	" Influenced negative\n",
	" </td>\n",
	" <td style=\"text-align: center\">\n",
	" $a_{21}$\n",
	" </td>\n",
	" <td style=\"text-align: center\">\n",
	" $a_{22}$\n",
	" </td>\n",
	"</tr>\n",
	"</tbody>\n",
	"</table>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The contingency table above counts the members of a subset $S \\subseteq C$ of customers (patients, etc.) of interest (our campaign group, say) in the buckets formed by two binary variables:\n",
	"- one on the rows describing the response of customer when subjected to influence and\n",
	"- one on the columns describing the response of a customer when _not_ subjected to influence.\n",
	"\n",
	"The usual labels, \"sure thing\", \"persuadable\", \"lost cause\" and \"do not disturb\", may be attached to the cells of the table."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Definition: The influenced response rate in $S$ is \n",
	"\n",
	"$$\n",
	"r^i(S) = \\frac{a_{11} + a_{12}}{\\sum_{i,j}a_{ij}}.\n",
	"$$\n",
	"\n",
	"Definition: The uninfluenced, or control, response rate in $S$ is \n",
	"\n",
	"$$\n",
	"r^u(S) = \\frac{a_{11} + a_{21}}{\\sum_{i,j}a_{ij}}.\n",
	"$$\n",
	"\n",
	"Definition: With $r^i(C)$ being the influenced response rate across all customers in $C$, the lift for applying influence in $S$ is \n",
	"\n",
	"$$\n",
	"l(S) = \\frac{r^i(S)}{r^i(C)}.\n",
	"$$\n",
	"\n",
	"Remark: It's not hard to show that\n",
	"$$\n",
	"0 \\leq l(S) \\leq \\frac{C}{S}.\n",
	"$$\n",
	"\n",
	"Example: If $l(S) = 1.5$ then the application of influence in $S$ results in a response rate $1.5\\times$ greater than the response rate of the application of influence to all customers in $C$.\n",
	"\n",
	"Definition: What I will call the gain for applying influence in $S$ is\n",
	"\n",
	"$$\n",
	"g(S) = r^i(S)-r^u(S) = \\frac{a_{12}-a_{21}}{\\sum_{i,j}a_{ij}}.\n",
	"$$\n",
	"\n",
	"Example: If the response rate in $S$ with no influence applied is $10\\%=0.1$ and the response rate in $S$ with influence applied is $20\\%=0.2$ then the gain is $g(S) = 0.2 - 0.1 = 0.1 = 10\\%$.\n",
	"\n",
	"Remark: The definition of _gain_ above is likely what you would have expected that I would provide for the _uplift_ but I have deliberately defined uplift as the ratio to follow so that it is formulated in a consistent way with the given definition of the lift. This ratio-based definition turns out to be relevant in the ensuing mathematics relating lift and uplift while the _gain_ as it appears above will not be used again here.\n",
	"\n",
	"Definition: The uplift for applying influence in $S$ is\n",
	"\n",
	"$$\n",
	"u(S) = \\frac{r^i(S)}{r^u(S)} = \\frac{a_{11} + a_{12}}{a_{11} + a_{21}}\n",
	"$$\n",
	"\n",
	"Example: If $u(S) = 1.5$ then the application of influence in $S$ results in a response rate $1.5\\times$ greater than the response rate when no influence is applied in $S$."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Inequalities relating lift and uplift in $S$ and $C$ \n",
	"\n",
	"Definitions done at last, let's consider some of the implications that the _size_ of $S$ has for $r^u(S)$ and $r^u(C)$. Noting that for any $T \\subseteq C$, $r^u(T) \|T\|$ is just the count of the number of uninfluenced positive responses in $T$ we have the following:\n",
	"\n",
	"$$\n",
	"\\begin{align}\n",
	"r^u(S) \|S\| &\\leq r^u(C) \|C\|\\\\\n",
	"r^u(S) &\\leq \\frac{\|C\|}{\|S\|} r^u(C).\\\\\n",
	"\\end{align}\n",
	"$$\n",
	"\n",
	"Similarly\n",
	"$$\n",
	"r^i(S) \\leq \\frac{\|C\|}{\|S\|} r^i(C).\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"It follows that \n",
	"\n",
	"$$\n",
	"\\begin{align}\n",
	"u(S) = \\frac{r^i(S)}{r^u(S)} &\\geq \\frac{r^i(S)}{\\frac{\|C\|}{\|S\|} r^u(C)}\\\\\n",
	"&\\geq \\frac{\|S\|}{\|C\|}\\frac{r^i(S)}{r^i(C)}\\frac{r^i(C)}{r^u(C)}\\\\\n",
	"&\\geq \\frac{\|S\|}{\|C\|} l(S)u(C)\n",
	"\\end{align}\n",
	"$$\n",
	"\n",
	"where $l(S)$ is the lift in $S$ as defined higher up. Working this time from the inequality on $r^i(S)$ we find\n",
	"\n",
	"$$\n",
	"\\begin{align}\n",
	"u(S) &\\leq \\frac{\|C\|}{\|S\|} \\frac{r^i(C)}{r^u(C)}\\\\\n",
	"&\\leq \\frac{\|C\|}{\|S\|} u(C).\n",
	"\\end{align}\n",
	"$$\n",
	"\n",
	"Assembling the two inequalities on $u(S)$ and dividing through by $u(C)$, we have the two-sided constraint on the ratio of the uplift in $S$ to the uplift in $C$.\n",
	"\n",
	"Theorem:\n",
	"$$\n",
	"\\frac{\|S\|}{\|C\|}l(S) \\leq \\frac{u(S)}{u(C)} \\leq \\frac{\|C\|}{\|S\|}.\n",
	"$$\n",
	"\n",
	"We can think of $u(C)$ as the overall, or average, effectiveness of the application of influence. It's how effective the influence is without any targeting at all."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Example: If we are considering targeting a group $S$ that is $20\\%$ of $C$ and our data scientists estimate that $l(S)\\approx 4$ but do not have the data to estimate the number we really want to know, $u(S)$, then we can still use the inequalities to estimate that\n",
	"\n",
	"$$\n",
	"\\begin{align}\n",
	"\\frac{1}{5}\\times 4 \\lesssim &\\frac{u(S)}{u(C)} \\leq \\frac{5}{1} \\\\\n",
	"0.8 \\lesssim &\\frac{u(S)}{u(C)} \\leq 5.\n",
	"\\end{align}\n",
	"$$\n",
	"\n",
	"So in this example it is guaranteed that the uplift in $S$ is greater than or equal to $0.8\\times u(C)$, $80\\%$ of the overall uplift, even if our model to target $S$ is the worst possible w.r.t. maximising uplift. It is also guaranteed that not even a crystal ball can target a set $S$ of customers such that the uplift in $S$ will be more than $5\\times u(C)$."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Remark: In practical terms the constraint imposed on by these inequalities is not very strong and in most cases it will not be particularly informative. But they are completely general and apply irrespective of predicitive model employed.\n",
	"\n",
	"Example: In the extreme case where $S$ contains all of the influenced positive response members of $C$ then\n",
	"\n",
	"$$\n",
	"l(S) = \\frac{C}{S}\n",
	"$$\n",
	"\n",
	"and\n",
	"\n",
	"$$\n",
	"1 \\leq \\frac{u(S)}{u(C)}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"© James Turton 2016"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.5.2+"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}