Last active
July 24, 2020 12:52
-
-
Save lgarrison/1efabe4430429996733a9d29397423d2 to your computer and use it in GitHub Desktop.
A note on the RR term in autocorrelations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# The RR term in autocorrelations\n", | |
"Author: Lehman Garrison (http://lgarrison.github.io)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"$\\newcommand{\\RR}{\\mathrm{RR}}$\n", | |
"$\\newcommand{\\DD}{\\mathrm{DD}}$\n", | |
"When computing a two-point correlation function estimator like\n", | |
"$\\xi(r) = \\frac{\\DD}{\\RR} - 1$,\n", | |
"the $\\RR$ term can be computed analytically if the domain is a periodic box.\n", | |
"In 2D, this is often done as\n", | |
"$$\\begin{align}\n", | |
"\\RR_i &= N A_i \\bar\\rho \\\\\n", | |
"&= N A_i \\frac{N}{L^2}\n", | |
"\\end{align}$$\n", | |
"where $\\RR_i$ is the expected number of random-random pairs in bin $i$, $N$ is the total number of points, $A_i$ is the area (or volume if 3D) of bin $i$, $L$ is the box size, and $\\bar\\rho$ is the average density in the box.\n", | |
"\n", | |
"However, using $\\bar\\rho = \\frac{N}{L^2}$ is not quite right. When sitting on a particle, only $N-1$ particles are available to be in a bin some distance away. The other particle is the particle you're sitting on, which is always at distance $0$. Thus, the correct expression is\n", | |
"$$\\RR_i = N A_i \\frac{N-1}{L^2}.$$\n", | |
"\n", | |
"The following notebook empirically demonstrates that using $N-1$ is correct, and that using $N$ introduces bias of order $\\frac{1}{N}$ into the estimator. This is a tiny correction for large $N$ problems, but important for small $N$.\n", | |
"\n", | |
"Cross-correlations of two different particle sets don't suffer from this problem; the particle you're sitting on is never part of the set of particles under consideration for pair-making.\n", | |
"\n", | |
"This $N-1$ correction is implemented in the [Corrfunc](https://github.com/manodeep/Corrfunc) code." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# Generates a set of N uniform random points in boxsize L\n", | |
"# and returns binned pair counts\n", | |
"# Pair counts are doubled, and self-pairs are counted\n", | |
"def rand_DD(N, L, bins):\n", | |
" # Make a 2D random set of N particles in [0,L)\n", | |
" x = np.random.random((2,N))*L\n", | |
" \n", | |
" # Form the periodic distance array\n", | |
" dist = x[:,None] - x[...,None]\n", | |
" dist -= np.round(dist/L)*L\n", | |
" dist = (dist**2).sum(axis=0)**.5\n", | |
" \n", | |
" # Count pairs in 2 bins\n", | |
" DD = np.histogram(dist, bins=bins)[0]\n", | |
" \n", | |
" return DD" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"DD/RR using density (N-1) = [ 1.00022627 0.99934544 0.99956887 1.00006867]\n", | |
"DD/RR using density (N) = [ 0.96817988 0.8994109 0.89961199 0.9000618 ]\n" | |
] | |
} | |
], | |
"source": [ | |
"# DD parameters\n", | |
"N = 10\n", | |
"L = 1.\n", | |
"bins = np.linspace(0.0,0.49, 5)\n", | |
"n_iter = 100000 # number of times to repeat the experiment\n", | |
"seed = 42\n", | |
"np.random.seed(seed)\n", | |
"\n", | |
"# First, calculate RR analytically\n", | |
"dA = (bins[1:]**2 - bins[:-1]**2)*np.pi # The area of each 2D annular bin\n", | |
"\n", | |
"RR = np.empty((2, len(dA)))\n", | |
"RR[:] = N/L**2*dA\n", | |
"\n", | |
"# Calculate RR using N-1 and N\n", | |
"RR[0] *= N - 1\n", | |
"RR[1] *= N\n", | |
"\n", | |
"# If the first bin includes 0, the random term must include the self-pairs\n", | |
"if bins[0] == 0.:\n", | |
" RR[:, 0] += N\n", | |
"\n", | |
"xi_mean = np.zeros((2, len(bins)-1))\n", | |
"for i in xrange(n_iter):\n", | |
" DD = rand_DD(N, L, bins)\n", | |
" xi_mean += DD/RR\n", | |
" \n", | |
"xi_mean /= n_iter\n", | |
"print 'DD/RR using density (N-1) = {}'.format(xi_mean[0])\n", | |
"print 'DD/RR using density (N) = {}'.format(xi_mean[1])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As we expected, using $N$ in the density leads to a $\\frac{1}{N} = 10\\%$ bias in $\\frac{\\DD}{\\RR}$, when averaging over many realizations of $\\DD$.\n", | |
"\n", | |
"The first bin has a smaller bias because we chose to include a bin that starts at $0$ separation, and thus includes self-pairs, which are indepdendent of the $N$ or $N-1$ density term." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python [Root]", | |
"language": "python", | |
"name": "Python [Root]" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.12" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment