Using ALS Matrix Factorization for Making Recommendations in Spark

patilvijay23 commented Jun 16, 2020

"Suprisingly, the algorithm gets even better evaluation scores when tested on all-unseen data! This means that the model can be underfitting"

This, I think, is not the correct interpretation.
When MPR is being computed, we rank the predictions. The training samples (userId x genreId) will get a higher prediction value than the test sample (userId x genreId).
So if we rank the whole dataset and then evaluate MPR by tagging test samples as 1, we will get a worse MPR than when we rank after removing the training samples: this is because in the first scenario, test samples will have worse rank than train samples and hence a worse MPR

Author

twolodzko commented Jun 16, 2020

@hellfire2310 good point.

ahmetlekesiz commented Aug 3, 2020

Hi,

I'm trying to understand how did you implement MPR but I am a little bit confused. Could you please give the pseudo code for MPR?. I will implement for my binary only dataset with PySpark ALS. Thank in advance.

Best wishes,
Ahmet

Author

twolodzko commented Aug 5, 2020

@ahmetlekesiz You can find some explanation here https://stats.stackexchange.com/questions/460166/mean-percentage-ranking-in-implicit-feedback-als I agree that the Spark implementation is not the most readable way to do this. Check also the links I provided in the notebook.

twolodzko/ALS Matrix Factorization in Spark.ipynb

Select an option

No results found

Select an option

No results found

patilvijay23 commented Jun 16, 2020

Uh oh!

twolodzko commented Jun 16, 2020

Uh oh!

ahmetlekesiz commented Aug 3, 2020

Uh oh!

twolodzko commented Aug 5, 2020

Uh oh!