Last active
March 25, 2025 12:52
-
-
Save jbochi/2e8ddcc5939e70e5368326aa034a144e to your computer and use it in GitHub Desktop.
Recommending GitHub repositories with Google Big Query and implicit library: https://medium.com/@jbochi/recommending-github-repositories-with-google-bigquery-and-the-implicit-library-e6cce666c77
If my dataset is mostly just 2 items per users, I assume LeavePOutByGroup is not the way to go? Because if I understand correctly, this would mean that each split would have mostly 1 item per users and therefore the model has nothing to learn.
@jbochi what is the license on this gist?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@joddm @antonioalegria
From my understanding, p in LeavePOutByGroup() should be <= to the (minimum number of items per user)/2.
For exemple, if your dataset has a user with activity for only 4 items, p should be <= 2.
Either you rebuild your dataset to include only users with activity for more products, or you filter out users with less than p*2 products from the test sets.
Hope that makes sense
It resolved the index out of bound error on my end.
See also