Skip to content

Instantly share code, notes, and snippets.

@dfabulich
Last active September 25, 2023 02:45
Show Gist options
  • Save dfabulich/fc6b13a8bffc5518c4731347de642749 to your computer and use it in GitHub Desktop.
Save dfabulich/fc6b13a8bffc5518c4731347de642749 to your computer and use it in GitHub Desktop.
Evan Miller Ranking Items with Star Ratings, in JavaScript
"use strict";
// based on unutbu's stackoverflow answer
// https://stackoverflow.com/a/40958702/54829
// which is based on Evan Miller's blog post
// http://www.evanmiller.org/ranking-items-with-star-ratings.html
function starsort(ratings) {
function sum(array) { return array.reduce((x, y) => x + y, 0) };
function square(x) { return x * x; };
const confidenceZ = 1.65;
// include five fake reviews
// 1 1-star, 1 2-star, 1 3-star, 1 4-star, 1 5-star
const fakeRatings = ratings.map(count => count + 1);
const N = sum(fakeRatings);
const average = sum(fakeRatings.map((count, i) => (i + 1) * count)) / N;
// Dirichlet standard deviation of average
const x = sum(fakeRatings.map((count, i) => square(i + 1) * count)) / N;
const standardDeviation = Math.sqrt((x - square(average)) / (N + 1));
return average - confidenceZ * standardDeviation;
}
console.log(starsort([60, 80, 75, 20, 25]));
@dfabulich
Copy link
Author

dfabulich commented Oct 22, 2018

I also needed this in SQL.

Let's assume you have tables like this:

CREATE TABLE `widgets` (
  `widget_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(100),
  PRIMARY KEY (`widget_id`)
);

CREATE TABLE `ratings` (
  `rating_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `widget_id` bigint(20) NOT NULL,
  `rating` tinyint(3) unsigned DEFAULT NULL,
  PRIMARY KEY (`rating_id`),
  KEY `widget_id` (`widget_id`)
);

You can use a query like this:

SELECT widgets.*, starsorted.* FROM widgets JOIN (
    SELECT
        rating_counts.*,
        (
            rated1 + rated2*2 + rated3*3 + rated4*4 + rated5*5
        )/count as average,
        (
            (5*(rated5+1)+4*(rated4+1)+3*(rated3+1)+2*(rated2+1)+1*(rated1+1)
        )/(
            5+rated5+rated4+rated3+rated2+rated1
        ))-1.65*SQRT(
            (
                (
                    (25*(rated5+1)+16*(rated4+1)+9*(rated3+1)+4*(rated2+1)+1*(rated1+1))
                        /(5+rated5+rated4+rated3+rated2+rated1)
                ) - POWER(
                    (
                        (5*(rated5+1)+4*(rated4+1)+3*(rated3+1)+2*(rated2+1)+1*(rated1+1))
                            /(5+rated5+rated4+rated3+rated2+rated1)
                    ), 2
                )
            )/(6+rated5+rated4+rated3+rated2+rated1)) as starsort
        FROM (
            SELECT
                widget_id,
                sum(case when rating = 1 then count else 0 end) as rated1,
                sum(case when rating = 2 then count else 0 end) as rated2,
                sum(case when rating = 3 then count else 0 end) as rated3,
                sum(case when rating = 4 then count else 0 end) as rated4,
                sum(case when rating = 5 then count else 0 end) as rated5,
                sum(count) as count
            FROM (
                SELECT
                    count(rating_id) as count,
                    rating,
                    widget_id
                FROM ratings
                GROUP BY rating, widget_id
            ) as grouped
            GROUP BY widget_id
        ) as rating_counts
    ) as starsorted
USING (widget_id)
ORDER BY starsort DESC LIMIT 100;

Naturally, this is going to have to scan over all of the rating rows to compute rating_counts. You can improve the performance of this query by maintaining starsorted as a materialized view.

@hashtag-sau
Copy link

thanks a lot for this.

@chanmathew
Copy link

chanmathew commented Dec 31, 2022

@dfabulich Have you tried this with floating numbers? Not sure how to modify this to fit my use case but, say I have an array of ratings like this:

[
  {
    rating: 4.3,
    totalRatings: 10
  },
  {
    rating: 4.0,
    totalRatings: 100
  }
]

How do I go about using the starsort function?

@dfabulich
Copy link
Author

No, I haven't, but I'm frankly confused by your question, in a way that makes me think that you're confused, too.

Do you actually allow real users to give a product a fractional score, like 4.3 out of 5? I assume not, right? You just let people give an integer star rating from 1 to 5, and "4.3" is the average of 10 ratings, right? To use starsort, you'd need the actual original rankings as the users provided them; you can't just take the average and try to feed that into starsort.

But, let's say you do allow users to give a score of 4.3 out of 5. OK, well, I assume you don't allow users to submit ratings with an unlimited number of digits. Users can't rate a product 4.31234745938 out of 5, can they? So I'm guessing they're allowed a single decimal point? So they can rate a product 4.3 out of 5, but not 4.32 out of 5? If they're only allowed a single decimal point, then you're really using a "fixed point" number, so you can just multiply their scores by ten, as if the users rated the product on a scale from 0 to 50, and 10 users rated the product 43 out of 50. That will let you use starsort normally.

Barring that, you could probably just round all of the numbers to one decimal place. 4.31234745938 is close enough to 4.3. Then you can treat the problem as if it were a fixed point number.

@sambeau
Copy link

sambeau commented Sep 25, 2023

Here's a Go version, for anyone searching like I was:

// starsort.go

package main

import(
	"fmt"
	"math"
)

// 	http://www.evanmiller.org/ranking-items-with-star-ratings.html

func sum(ns []int) int {
	var total int
	for _,n := range ns {
		total += n
	}
	return total
}

func f(s []int, ns []int)float64{
	N := sum(ns)
	K := len(ns)
	ks := make([]int,K)
	for i:=0; i<5; i++ {
		ks[i] = s[i] * (ns[i]+1)
	}
	return float64(sum(ks)) / float64(N+K)
}

func starSort(ns []int) float64 {
	N := sum(ns)
	K := len(ns)
	s := []int{5, 4, 3, 2, 1}
	s2 := []int{25, 16, 9, 4, 1}
	z := float64(1.65)
	fsns := f(s, ns)
	return fsns - z * math.Sqrt(((f(s2, ns) - (fsns*fsns)) / float64(N+K+1)))
}

func main(){
	fmt.Println(starSort([]int{60, 80, 75, 20, 25}))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment