Skip to content

Instantly share code, notes, and snippets.

@ntulip
Created February 21, 2011 15:08
Show Gist options
  • Save ntulip/837176 to your computer and use it in GitHub Desktop.
Save ntulip/837176 to your computer and use it in GitHub Desktop.
Reddit's comment ranking
# http://amix.dk/blog/post/19588?source=google
from math import sqrt
def _confidence(ups, downs):
n = ups + downs
if n == 0:
return 0
z = 1.0 #1.0 = 85%, 1.6 = 95%
phat = float(ups) / n
return sqrt(phat+z*z/(2*n)-z*((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
def confidence(ups, downs):
if ups + downs == 0:
return 0
else:
return _confidence(ups, downs)
@gnprice
Copy link

gnprice commented Jun 1, 2011

This doesn't match the comment ranking as described in this 2009 post on the Reddit blog: http://blog.reddit.com/2009/10/reddits-new-comment-sorting-system.html or specifically in this link from that post: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

In particular, instead of wrapping the whole numerator in sqrt(), you want to wrap only the expression (phat_(1-phat)+z_z/(4*n))/n. This is a major difference in the formula.

Also, z = 1.0 corresponds to an 84% confidence interval, not 85%; but in any case the gist is described as "Reddit's comment ranking", and at least as of that 2009 blog post Reddit used a 95% confidence interval. Do you know that that's changed? If not, it'd be better to write z = 1.64, which gives a 95% confidence interval.

@gnprice
Copy link

gnprice commented Jun 1, 2011

OK, here's the actual Reddit comment ranking, according to their own open-sourced code:
https://github.com/reddit/reddit/blob/master/r2/r2/lib/db/_sorts.pyx#L40

It puts the sqrt() in the place I described, and then their z-value turns out to be this:

 z = 1.281551565545 # 80% confidence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment