Created
March 17, 2020 21:27
-
-
Save lparolari/a0fd78e34ecb4679ff575dfa5c565b14 to your computer and use it in GitHub Desktop.
Proof for the theorem at page 32 slides ``MapReduce''. (Big Data Computing course at University of Padua, Italy, AY 2019-2020).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\documentclass[11pt,a4paper]{article} | |
\usepackage[utf8]{inputenc} | |
\usepackage[T1]{fontenc} | |
\usepackage[english]{babel} | |
\usepackage{amsmath} | |
\usepackage{amsfonts} | |
\usepackage{amssymb} | |
\usepackage{amsthm} | |
\usepackage{graphicx} | |
\newtheorem{theorem}{Theorem} | |
\title{} | |
\author{Luca Parolari\footnote{[email protected]}} | |
\begin{document} | |
\maketitle | |
Proof for the theorem at page 32 slides ``MapReduce''. (Big Data Computing course at University of Padua, Italy, AY 2019-2020). | |
\begin{theorem} | |
Suppose that the keys assigned to the intermediate pairs in Round 1 are i.i.d. random variables with uniform distribution in $[0, \sqrt{N}]$. Then, with probability at least $1-1/N^5$ | |
\[ | |
m = O(\sqrt{N}) | |
\] | |
\end{theorem} | |
\begin{proof} | |
Let $N' \leq N$ be the number of intermediate pairs produced by the map phase in Round 1. Note that $N'$, arbitrarily chosen, is usually less than $N$, but for generality we admit the case $N' = N$. | |
Consider now the key $x \in [0, \sqrt{N}]$. We can observe that $m_x$ (the number of intermediate pairs with key $x$) is a $\text{Binomial}(N', 1/\sqrt{N})$ random variable where $N'$ is the number of items and $1/\sqrt{N}$ is its probability. The expectation of this random variable is $\mu = N' \cdot 1/\sqrt{N} \leq \sqrt{N}$. | |
We can now applying the Chernoff Bound and we get | |
\begin{align*} | |
\text{Pr}(m_x \geq 6\mu) &= \text{Pr}(m_x \geq 6\sqrt{N})\\ | |
& \leq \frac{1}{2^{6\sqrt{N}}} \\ | |
& \leq \frac{1}{2^{6 \log N}} = \frac{1}{2^{\log N^6}} = \frac{1}{N^6} | |
\end{align*} | |
Now, exploiting the Union Bound\footnote{Recall that the union bound is $\text{Pr}(\sum_{i} A_i) = \Sigma_i5 \text{Pr}(A_i)$} (note that with $\text{Pr}(m \geq 6\sqrt{N})$ we want to say something like $\bigcup_{x \in [0,\sqrt{N}]} \text{Pr}(m_x \geq 6\sqrt{N})$), we get | |
\begin{align*} | |
\text{Pr}(m \geq 6\sqrt{N}) &= \sum_{x\in[0,\sqrt{N}]} \text{Pr}(m_x \geq 6\sqrt{N}) \\ | |
&\leq \sqrt{N}\ \text{Pr}(m_x \geq 6\sqrt{N})\\ | |
&\leq \sqrt{N} \cdot \frac{1}{N^6}\\ | |
& = N^{\frac{1-6}{2}} = N^{-\frac{11}{2}}\\ | |
&\leq \frac{1}{N^5} | |
\end{align*} | |
Since we want an upper bound we can write the probability as $1 - \text{Pr}(m \geq 6\sqrt{N}) = 1-\frac{1}{N^5}$. | |
This probability is very close to 1, so we get $m = O(\sqrt{N})$ with the probability of at least $1-1/N^5$. | |
\end{proof} | |
\end{document} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment