Skip to content

Instantly share code, notes, and snippets.

@latompa
Created January 8, 2009 01:02
Show Gist options
  • Save latompa/44522 to your computer and use it in GitHub Desktop.
Save latompa/44522 to your computer and use it in GitHub Desktop.
bayesian rating
-- building on data from my scenario http://gist.github.com/44489
-- and bayesian rating http://www.thebroth.com/blog/118/bayesian-rating
assume we have a products table and we want to sort products based on their rating.
With bayesian rating, you get better weighting, so that a single 5 star vote won't show up as "top product"
select * from products;
mysql> select * from products;
+------+--------+---------------+--------------+---------+
| id | name | ratings_count | total_rating | weight |
+------+--------+---------------+--------------+---------+
| 1 | apple | 3 | 4 | 0 |
| 2 | orange | 7 | 31 | 0 |
| 3 | banana | 1 | 5 | 0 |
| 4 | natto | 0 | 0 | 0 |
+------+--------+---------------+--------------+---------+
mysql> select * from ratings;
+------------+--------+
| product_id | rating |
+------------+--------+
| 1 | 1 |
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 5 |
| 2 | 4 |
| 2 | 5 |
| 2 | 5 |
| 3 | 5 |
+------------+--------+
-- As you can see,
-- # product_id 1 "apple" is not very liked,
-- # product_id 2 "orange" is quite popular.
-- # product_id 3 "banana" only has one vote
-- banana won't have a better weight than orange, even though the average rating is better
--
-- Lets calculate weights. This is something you have to run every nite or so
-- since new rating records will change the weighting
-- average number of votes
set @avg_num_votes = (select avg(votes.c) from (select product_id,count(*) as c from ratings group by product_id) as votes)
-- average rating
set @avg_rating = (select avg(votes.c) from (select product_id,avg(rating) as c from ratings group by product_id) as votes)
--
-- calculate weights. I had to use a temp table, since I wasn't able to update products and do a subquery in itself.
--
create temporary table product_weights (id integer, weight float);
insert into product_weights
select id,
( (@avg_num_votes * @avg_rating) +
(ratings_count * (total_rating / ratings_count))) /
(@avg_num_votes + ratings_count) as weight
from products
-- update the products table
update products set weight=
coalesce((select weight from product_weights where product_weights.id=products.id),0)
--
-- end result
--
select * from products;
+------+--------+---------------+--------------+---------+
| id | name | ratings_count | total_rating | weight |
+------+--------+---------------+--------------+---------+
| 1 | apple | 3 | 4 | 2.57301 |
| 2 | orange | 7 | 31 | 4.13938 |
| 3 | banana | 1 | 5 | 3.89002 |
| 4 | natto | 0 | 0 | 0 |
+------+--------+---------------+--------------+---------+
@ruanltbg
Copy link

Thanks man. I changed from insert/select to insert/inner join (my base has more than 150.000 records) and it worked like a charm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment