jbochi/evaluation.ipynb

Last active May 3, 2025 05:43

Star (63) You must be signed in to star a gist
Fork (11) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/jbochi/2e8ddcc5939e70e5368326aa034a144e.js"></script>
Save jbochi/2e8ddcc5939e70e5368326aa034a144e to your computer and use it in GitHub Desktop.

Recommending GitHub repositories with Google Big Query and implicit library: https://medium.com/@jbochi/recommending-github-repositories-with-google-bigquery-and-the-implicit-library-e6cce666c77

Raw

evaluation.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Raw

recommendations.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

antonioalegria commented Jun 14, 2018 •

edited

Loading

Hi @jbochi , i'm trying to use this code to evaluate on another dataset but I'm getting a bunch of Index out of bounds errors because the factors in the train data are not the same as in the test data. This is probably because the Users/Items in train are different than the ones seen in test.

How would you adapt this code to face this challenge? Or is my theory incorrect? /cc @antonioalegria

joddm commented Nov 2, 2018 •

edited

Loading

I also have problems with index out of bounds, did you figure it out @antonioalegria?

Do you know what version of the libraries you ran this with @jbochi? Running now with

pandas: 0.23.4
numpy: 1.15.2
scipy: 1.1.0
implicit: 0.3.8
sklearn: 0.20.0

My original dataset is of shape:

<20210x4324 sparse matrix of type '<class 'numpy.float64'>'
	with 116992 stored elements in COOrdinate format>

and the truth variable in ndcg_scorer transforms the test split to shape (20206, 4324), while the predictions variable is of shape (20210, 4310).

So this is what's causing the index out of bounds error.

Edit: By changing the p variable, I managed to correct the predictions shape, but I don't understand why the truth variable is of shape (20206, 4324). My guess is the same as yours @antonioalegria, that in LeavePOutByGroup, in one of the splits there are users that don't have purchased some products, hence the full dimensions are not restored in truth

Okay, by filtering out purchases with fewer customers than x (trying out different values), I managed to get a correct truth shape, but now the predictions shape is off. Aah... :) Do you know of any heuristic @jbochi?

seb799 commented Jan 17, 2019

@joddm @antonioalegria
From my understanding, p in LeavePOutByGroup() should be <= to the (minimum number of items per user)/2.
For exemple, if your dataset has a user with activity for only 4 items, p should be <= 2.

Either you rebuild your dataset to include only users with activity for more products, or you filter out users with less than p*2 products from the test sets.

Hope that makes sense

It resolved the index out of bound error on my end.

DaStapo commented Aug 23, 2020

If my dataset is mostly just 2 items per users, I assume LeavePOutByGroup is not the way to go? Because if I understand correctly, this would mean that each split would have mostly 1 item per users and therefore the model has nothing to learn.

kylemcmearty commented Oct 9, 2020

@jbochi what is the license on this gist?

jbochi/evaluation.ipynb

antonioalegria commented Jun 14, 2018 •

edited

Loading

Uh oh!

joddm commented Nov 2, 2018 •

edited

Loading

Uh oh!

seb799 commented Jan 17, 2019

Uh oh!

DaStapo commented Aug 23, 2020

Uh oh!

kylemcmearty commented Oct 9, 2020

Uh oh!

jbochi/evaluation.ipynb

antonioalegria commented Jun 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joddm commented Nov 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seb799 commented Jan 17, 2019

Uh oh!

DaStapo commented Aug 23, 2020

Uh oh!

kylemcmearty commented Oct 9, 2020

Uh oh!

antonioalegria commented Jun 14, 2018 •

edited

Loading

joddm commented Nov 2, 2018 •

edited

Loading