Last active
November 12, 2024 11:32
-
-
Save jbochi/2e8ddcc5939e70e5368326aa034a144e to your computer and use it in GitHub Desktop.
Recommending GitHub repositories with Google Big Query and implicit library: https://medium.com/@jbochi/recommending-github-repositories-with-google-bigquery-and-the-implicit-library-e6cce666c77
@joddm @antonioalegria
From my understanding, p in LeavePOutByGroup() should be <= to the (minimum number of items per user)/2.
For exemple, if your dataset has a user with activity for only 4 items, p should be <= 2.
Either you rebuild your dataset to include only users with activity for more products, or you filter out users with less than p*2 products from the test sets.
Hope that makes sense
It resolved the index out of bound error on my end.
If my dataset is mostly just 2 items per users, I assume LeavePOutByGroup is not the way to go? Because if I understand correctly, this would mean that each split would have mostly 1 item per users and therefore the model has nothing to learn.
@jbochi what is the license on this gist?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I also have problems with index out of bounds, did you figure it out @antonioalegria?
Do you know what version of the libraries you ran this with @jbochi? Running now with
My original dataset is of shape:
and the
truth
variable inndcg_scorer
transforms the test split to shape(20206, 4324)
, while thepredictions
variable is of shape(20210, 4310)
.So this is what's causing the index out of bounds error.
Edit: By changing the p variable, I managed to correct the
predictions
shape, but I don't understand why thetruth
variable is of shape(20206, 4324)
. My guess is the same as yours @antonioalegria, that in LeavePOutByGroup, in one of the splits there are users that don't have purchased some products, hence the full dimensions are not restored intruth
Okay, by filtering out purchases with fewer customers than x (trying out different values), I managed to get a correct
truth
shape, but now thepredictions
shape is off. Aah... :) Do you know of any heuristic @jbochi?