Skip to content

Instantly share code, notes, and snippets.

@mrocklin
Created October 10, 2018 18:43
Show Gist options
  • Save mrocklin/233810e6813e7fd5b5a40f08bde02758 to your computer and use it in GitHub Desktop.
Save mrocklin/233810e6813e7fd5b5a40f08bde02758 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@iamkissg
Copy link

Hi Matthew, what if you don't persist data? I tried the same pre-processing on another dataset, which is too large to persist into memory. It raises ValueError: ('Arrays chunk sizes are unknown: %s'.... I searched much for the solution but sitll don't know how to deal with this.

@mrocklin
Copy link
Author

Unfortunately it has been a while since I've played with this, and I don't immediately have a good answer for you. I recommend that you ask the folks at https://github.com/dask/dask-ml/issues/new

@Sandy4321
Copy link

if somebody tried to run locally ?
like
https://github.com/rambler-digital-solutions/criteo-1tb-benchmark
how to read data locally from these kind parquet fiels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment