Last active
July 1, 2023 23:33
-
-
Save twolodzko/7becd98ff256ef826b56945de297700d to your computer and use it in GitHub Desktop.
Using ALS Matrix Factorization for Making Recommendations in Spark
Author
@hellfire2310 good point.
Hi,
I'm trying to understand how did you implement MPR but I am a little bit confused. Could you please give the pseudo code for MPR?. I will implement for my binary only dataset with PySpark ALS. Thank in advance.
Best wishes,
Ahmet
Author
@ahmetlekesiz You can find some explanation here https://stats.stackexchange.com/questions/460166/mean-percentage-ranking-in-implicit-feedback-als I agree that the Spark implementation is not the most readable way to do this. Check also the links I provided in the notebook.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
"Suprisingly, the algorithm gets even better evaluation scores when tested on all-unseen data! This means that the model can be underfitting"
This, I think, is not the correct interpretation.
When MPR is being computed, we rank the predictions. The training samples (userId x genreId) will get a higher prediction value than the test sample (userId x genreId).
So if we rank the whole dataset and then evaluate MPR by tagging test samples as 1, we will get a worse MPR than when we rank after removing the training samples: this is because in the first scenario, test samples will have worse rank than train samples and hence a worse MPR