Created
February 21, 2011 15:08
-
-
Save ntulip/837176 to your computer and use it in GitHub Desktop.
Reddit's comment ranking
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# http://amix.dk/blog/post/19588?source=google | |
from math import sqrt | |
def _confidence(ups, downs): | |
n = ups + downs | |
if n == 0: | |
return 0 | |
z = 1.0 #1.0 = 85%, 1.6 = 95% | |
phat = float(ups) / n | |
return sqrt(phat+z*z/(2*n)-z*((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n) | |
def confidence(ups, downs): | |
if ups + downs == 0: | |
return 0 | |
else: | |
return _confidence(ups, downs) |
OK, here's the actual Reddit comment ranking, according to their own open-sourced code:
https://github.com/reddit/reddit/blob/master/r2/r2/lib/db/_sorts.pyx#L40
It puts the sqrt() in the place I described, and then their z-value turns out to be this:
z = 1.281551565545 # 80% confidence
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This doesn't match the comment ranking as described in this 2009 post on the Reddit blog: http://blog.reddit.com/2009/10/reddits-new-comment-sorting-system.html or specifically in this link from that post: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
In particular, instead of wrapping the whole numerator in sqrt(), you want to wrap only the expression (phat_(1-phat)+z_z/(4*n))/n. This is a major difference in the formula.
Also, z = 1.0 corresponds to an 84% confidence interval, not 85%; but in any case the gist is described as "Reddit's comment ranking", and at least as of that 2009 blog post Reddit used a 95% confidence interval. Do you know that that's changed? If not, it'd be better to write z = 1.64, which gives a 95% confidence interval.