Skip to content

Instantly share code, notes, and snippets.

@lparolari
Created March 17, 2020 21:27
Show Gist options
  • Save lparolari/a0fd78e34ecb4679ff575dfa5c565b14 to your computer and use it in GitHub Desktop.
Save lparolari/a0fd78e34ecb4679ff575dfa5c565b14 to your computer and use it in GitHub Desktop.
Proof for the theorem at page 32 slides ``MapReduce''. (Big Data Computing course at University of Padua, Italy, AY 2019-2020).
\documentclass[11pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[english]{babel}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{graphicx}
\newtheorem{theorem}{Theorem}
\title{}
\author{Luca Parolari\footnote{[email protected]}}
\begin{document}
\maketitle
Proof for the theorem at page 32 slides ``MapReduce''. (Big Data Computing course at University of Padua, Italy, AY 2019-2020).
\begin{theorem}
Suppose that the keys assigned to the intermediate pairs in Round 1 are i.i.d. random variables with uniform distribution in $[0, \sqrt{N}]$. Then, with probability at least $1-1/N^5$
\[
m = O(\sqrt{N})
\]
\end{theorem}
\begin{proof}
Let $N' \leq N$ be the number of intermediate pairs produced by the map phase in Round 1. Note that $N'$, arbitrarily chosen, is usually less than $N$, but for generality we admit the case $N' = N$.
Consider now the key $x \in [0, \sqrt{N}]$. We can observe that $m_x$ (the number of intermediate pairs with key $x$) is a $\text{Binomial}(N', 1/\sqrt{N})$ random variable where $N'$ is the number of items and $1/\sqrt{N}$ is its probability. The expectation of this random variable is $\mu = N' \cdot 1/\sqrt{N} \leq \sqrt{N}$.
We can now applying the Chernoff Bound and we get
\begin{align*}
\text{Pr}(m_x \geq 6\mu) &= \text{Pr}(m_x \geq 6\sqrt{N})\\
& \leq \frac{1}{2^{6\sqrt{N}}} \\
& \leq \frac{1}{2^{6 \log N}} = \frac{1}{2^{\log N^6}} = \frac{1}{N^6}
\end{align*}
Now, exploiting the Union Bound\footnote{Recall that the union bound is $\text{Pr}(\sum_{i} A_i) = \Sigma_i5 \text{Pr}(A_i)$} (note that with $\text{Pr}(m \geq 6\sqrt{N})$ we want to say something like $\bigcup_{x \in [0,\sqrt{N}]} \text{Pr}(m_x \geq 6\sqrt{N})$), we get
\begin{align*}
\text{Pr}(m \geq 6\sqrt{N}) &= \sum_{x\in[0,\sqrt{N}]} \text{Pr}(m_x \geq 6\sqrt{N}) \\
&\leq \sqrt{N}\ \text{Pr}(m_x \geq 6\sqrt{N})\\
&\leq \sqrt{N} \cdot \frac{1}{N^6}\\
& = N^{\frac{1-6}{2}} = N^{-\frac{11}{2}}\\
&\leq \frac{1}{N^5}
\end{align*}
Since we want an upper bound we can write the probability as $1 - \text{Pr}(m \geq 6\sqrt{N}) = 1-\frac{1}{N^5}$.
This probability is very close to 1, so we get $m = O(\sqrt{N})$ with the probability of at least $1-1/N^5$.
\end{proof}
\end{document}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment