Created
January 8, 2009 01:02
-
-
Save latompa/44522 to your computer and use it in GitHub Desktop.
bayesian rating
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- building on data from my scenario http://gist.github.com/44489 | |
-- and bayesian rating http://www.thebroth.com/blog/118/bayesian-rating | |
assume we have a products table and we want to sort products based on their rating. | |
With bayesian rating, you get better weighting, so that a single 5 star vote won't show up as "top product" | |
select * from products; | |
mysql> select * from products; | |
+------+--------+---------------+--------------+---------+ | |
| id | name | ratings_count | total_rating | weight | | |
+------+--------+---------------+--------------+---------+ | |
| 1 | apple | 3 | 4 | 0 | | |
| 2 | orange | 7 | 31 | 0 | | |
| 3 | banana | 1 | 5 | 0 | | |
| 4 | natto | 0 | 0 | 0 | | |
+------+--------+---------------+--------------+---------+ | |
mysql> select * from ratings; | |
+------------+--------+ | |
| product_id | rating | | |
+------------+--------+ | |
| 1 | 1 | | |
| 1 | 1 | | |
| 1 | 2 | | |
| 2 | 3 | | |
| 2 | 4 | | |
| 2 | 5 | | |
| 2 | 5 | | |
| 2 | 4 | | |
| 2 | 5 | | |
| 2 | 5 | | |
| 3 | 5 | | |
+------------+--------+ | |
-- As you can see, | |
-- # product_id 1 "apple" is not very liked, | |
-- # product_id 2 "orange" is quite popular. | |
-- # product_id 3 "banana" only has one vote | |
-- banana won't have a better weight than orange, even though the average rating is better | |
-- | |
-- Lets calculate weights. This is something you have to run every nite or so | |
-- since new rating records will change the weighting | |
-- average number of votes | |
set @avg_num_votes = (select avg(votes.c) from (select product_id,count(*) as c from ratings group by product_id) as votes) | |
-- average rating | |
set @avg_rating = (select avg(votes.c) from (select product_id,avg(rating) as c from ratings group by product_id) as votes) | |
-- | |
-- calculate weights. I had to use a temp table, since I wasn't able to update products and do a subquery in itself. | |
-- | |
create temporary table product_weights (id integer, weight float); | |
insert into product_weights | |
select id, | |
( (@avg_num_votes * @avg_rating) + | |
(ratings_count * (total_rating / ratings_count))) / | |
(@avg_num_votes + ratings_count) as weight | |
from products | |
-- update the products table | |
update products set weight= | |
coalesce((select weight from product_weights where product_weights.id=products.id),0) | |
-- | |
-- end result | |
-- | |
select * from products; | |
+------+--------+---------------+--------------+---------+ | |
| id | name | ratings_count | total_rating | weight | | |
+------+--------+---------------+--------------+---------+ | |
| 1 | apple | 3 | 4 | 2.57301 | | |
| 2 | orange | 7 | 31 | 4.13938 | | |
| 3 | banana | 1 | 5 | 3.89002 | | |
| 4 | natto | 0 | 0 | 0 | | |
+------+--------+---------------+--------------+---------+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks man. I changed from insert/select to insert/inner join (my base has more than 150.000 records) and it worked like a charm.