mrocklin/criteo.ipynb

Created October 10, 2018 18:43

Star (1) You must be signed in to star a gist
Fork (1) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/mrocklin/233810e6813e7fd5b5a40f08bde02758.js"></script>
Save mrocklin/233810e6813e7fd5b5a40f08bde02758 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

criteo.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

iamkissg commented Jun 14, 2019

Hi Matthew, what if you don't persist data? I tried the same pre-processing on another dataset, which is too large to persist into memory. It raises ValueError: ('Arrays chunk sizes are unknown: %s'.... I searched much for the solution but sitll don't know how to deal with this.

Author

mrocklin commented Jun 14, 2019

Unfortunately it has been a while since I've played with this, and I don't immediately have a good answer for you. I recommend that you ask the folks at https://github.com/dask/dask-ml/issues/new

Sandy4321 commented Jan 26, 2020

if somebody tried to run locally ?
like
https://github.com/rambler-digital-solutions/criteo-1tb-benchmark
how to read data locally from these kind parquet fiels?

mrocklin/criteo.ipynb

iamkissg commented Jun 14, 2019

Uh oh!

mrocklin commented Jun 14, 2019

Uh oh!

Sandy4321 commented Jan 26, 2020

Uh oh!