zhouzhuojie · August 2, 2013 13:26
diff --git a/Summary b/Summary
 List
 --------------

 * Statistics - MCMC
 * Statistics - One long run vs Many short runs
 * CS - Non-backtracking
 * CS - Tree Sampling vs Graph Sampling
 * Other things



 Statistics - MCMC
 -------------

 #### 1. Asymptotic Variance is Key

 Consider random walk sequence as a AR(1) series.

 $$ \mu = E(X_n) = \rho E(X_{n-1}) + E(e_n) $$

 And starting from stationary distribution:

 $$ \sigma^2_{CLT} = \gamma_0 + 2\sum_{k=1}^\infty \gamma_k,$$
 where $\gamma_k = Cov(Y_j, Y_{j+k})$, 

 and $Y_j$ is some attribute function of $X_j$: $Y_j = f(X_j)$

 #### 2. Esitmate the Asymptotic Variance Using Batch Means

 * Batch Mean Method
 * Overlapping batch mean method

 #### 3. Effective Sample Size
 Roughly $\sqrt{\frac{1+\rho}{1-\rho}}$

 Statistics - One long run vs Many short runs
 ----------------

 #### Questions

 1. Samples in the sample pool are **dependent**, should we 'thin' it?
  * Yes, if space/storage is not computational or statistical efficient
  * Yes, if $g(X_i)$ is computational expensive
  * No, otherwise
  * typical, for Online Social Network, we do not need to 'thin' the random walk samples.
 2. Should we discard "burn-in"?
  * Burn-ins are just pushing the initial distribution from a point {x} to {$P^nx$}
  * See experiment table [here](https://docs.google.com/spreadsheet/ccc?key=0AmIBiL5nyHqXdHZOdE9vcnpOV01ON2pMcmtHUENTLVE&usp=sharing). Basically we do not need burn-in if we adopt one long run random walk.
 3. Multiple short runs?
  * "Diversity of the initial distribution". Still, there is no guarantee to improve statistical or computational efficiency.
  * Likely to give indication of convergence faster
 4. How long is enough?
  * For one long run, just use the query budget.



 Statistics - Rule of Thumb for Importance Sampling
 ----------------
 #### Introduction

 $$\hat{\mu} = \frac{1}{N}\sum_{k=1}^Nh(X_k)\frac{\pi(X_k)}{p(X_k)}$$
 where 

 $h(\cdot)$ is the attribute function,

 $\pi(\cdot)$ is the expected/target distribution density function, 

 $p(\cdot)$ is the sampling/obtained distribution density function. $p(\cdot)$ is easy to obtain, e.g. SRW: $p(v) = \frac{k_v}{2|E|}$

 #### Ideal case for Important Sampling

 $$p(X_k) \approx h(X_k)\pi(X_k)$$

 Non-backtracking Random Walk
 ----------------------

 #### Introduction
 $P$ and $P'$ are two random walks' transition matrix. If:
 $$
 \begin{equation}
 \begin{split}
 P(j,i)P'(e_{ij}, e_{jk}) &= P(j,k)P'(e_{kj},e_{ji}) \\
 P'(e_{ij}, e_{jk}) &\geq P(j,k) 
 \end{split}
 \end{equation}
 $$
 then the random walk with $P'$ with have a unique stationary distribution $\pi'(e_{ij}) = \pi(i)P(i,j)$, and $\sigma'^2(f)\leq \sigma^2(f)$.

 #### Design of the non-backtracking random walk

 * Never go back from the income edge
 * $P(j,i) = 1/d(j)$
 * $P'(e_{ji}, e_{jk}) = 1/(d(j)-1)$
 * $P'(e_{ji}, e_{jk}) \geq P(j,i)$

 #### New Ideas
 1. Pushing $P'(e_{ij}, e_{jk}) \geq P(j,k) $
  * Divide the neighbors into two groups, $C_1$ and $C_2$
    * Come in $C_1$ then go out to $C_2$, and vice versa.
    * Although sometime it increase the probability of going out to another group, the probability $P'(e_{ji}, e_{jk})$ remains the same.
      * $$ P'(e_{ji}, e_{jk}) = 0.5\cdot 0 + 0.5\cdot 1/(0.5\cdot d) = 1/d = P(j,k) $$
    * Maybe some experiments?


 CS - Tree Sampling vs Graph Sampling
 --------------

 #### Questions
 1. Is tree sampling better than graph sampling?
 2. Find better starting node for tree sampling?
 3. On-the-fly drill down for tree sampling. How to accurately calculate the probability of a node being accessed?

 #### Some observations
 1. Adding random edges to a tree will increase the conductance of the graph.
 2. By selecting the nodes that are connected to cross-cutting edges, can we design random walk that are much faster than original simple random walk?
	List
	--------------

	* Statistics - MCMC
	* Statistics - One long run vs Many short runs
	* CS - Non-backtracking
	* CS - Tree Sampling vs Graph Sampling
	* Other things



	Statistics - MCMC
	-------------

	#### 1. Asymptotic Variance is Key

	Consider random walk sequence as a AR(1) series.

	$$ \mu = E(X_n) = \rho E(X_{n-1}) + E(e_n) $$

	And starting from stationary distribution:

	$$ \sigma^2_{CLT} = \gamma_0 + 2\sum_{k=1}^\infty \gamma_k,$$
	where $\gamma_k = Cov(Y_j, Y_{j+k})$,

	and $Y_j$ is some attribute function of $X_j$: $Y_j = f(X_j)$

	#### 2. Esitmate the Asymptotic Variance Using Batch Means

	* Batch Mean Method
	* Overlapping batch mean method

	#### 3. Effective Sample Size
	Roughly $\sqrt{\frac{1+\rho}{1-\rho}}$

	Statistics - One long run vs Many short runs
	----------------

	#### Questions

	1. Samples in the sample pool are dependent, should we 'thin' it?
	* Yes, if space/storage is not computational or statistical efficient
	* Yes, if $g(X_i)$ is computational expensive
	* No, otherwise
	* typical, for Online Social Network, we do not need to 'thin' the random walk samples.
	2. Should we discard "burn-in"?
	* Burn-ins are just pushing the initial distribution from a point {x} to {$P^nx$}
	* See experiment table [here](https://docs.google.com/spreadsheet/ccc?key=0AmIBiL5nyHqXdHZOdE9vcnpOV01ON2pMcmtHUENTLVE&usp=sharing). Basically we do not need burn-in if we adopt one long run random walk.
	3. Multiple short runs?
	* "Diversity of the initial distribution". Still, there is no guarantee to improve statistical or computational efficiency.
	* Likely to give indication of convergence faster
	4. How long is enough?
	* For one long run, just use the query budget.



	Statistics - Rule of Thumb for Importance Sampling
	----------------
	#### Introduction

	$$\hat{\mu} = \frac{1}{N}\sum_{k=1}^Nh(X_k)\frac{\pi(X_k)}{p(X_k)}$$
	where

	$h(\cdot)$ is the attribute function,

	$\pi(\cdot)$ is the expected/target distribution density function,

	$p(\cdot)$ is the sampling/obtained distribution density function. $p(\cdot)$ is easy to obtain, e.g. SRW: $p(v) = \frac{k_v}{2\|E\|}$

	#### Ideal case for Important Sampling

	$$p(X_k) \approx h(X_k)\pi(X_k)$$

	Non-backtracking Random Walk
	----------------------

	#### Introduction
	$P$ and $P'$ are two random walks' transition matrix. If:
	$$
	\begin{equation}
	\begin{split}
	P(j,i)P'(e_{ij}, e_{jk}) &= P(j,k)P'(e_{kj},e_{ji}) \\
	P'(e_{ij}, e_{jk}) &\geq P(j,k)
	\end{split}
	\end{equation}
	$$
	then the random walk with $P'$ with have a unique stationary distribution $\pi'(e_{ij}) = \pi(i)P(i,j)$, and $\sigma'^2(f)\leq \sigma^2(f)$.

	#### Design of the non-backtracking random walk

	* Never go back from the income edge
	* $P(j,i) = 1/d(j)$
	* $P'(e_{ji}, e_{jk}) = 1/(d(j)-1)$
	* $P'(e_{ji}, e_{jk}) \geq P(j,i)$

	#### New Ideas
	1. Pushing $P'(e_{ij}, e_{jk}) \geq P(j,k) $
	* Divide the neighbors into two groups, $C_1$ and $C_2$
	* Come in $C_1$ then go out to $C_2$, and vice versa.
	* Although sometime it increase the probability of going out to another group, the probability $P'(e_{ji}, e_{jk})$ remains the same.
	* $$ P'(e_{ji}, e_{jk}) = 0.5\cdot 0 + 0.5\cdot 1/(0.5\cdot d) = 1/d = P(j,k) $$
	* Maybe some experiments?


	CS - Tree Sampling vs Graph Sampling
	--------------

	#### Questions
	1. Is tree sampling better than graph sampling?
	2. Find better starting node for tree sampling?
	3. On-the-fly drill down for tree sampling. How to accurately calculate the probability of a node being accessed?

	#### Some observations
	1. Adding random edges to a tree will increase the conductance of the graph.
	2. By selecting the nodes that are connected to cross-cutting edges, can we design random walk that are much faster than original simple random walk?