Last active
June 1, 2021 06:49
-
-
Save GenevieveBuckley/f9f8219de5c052c3deb234cc44ebc0a2 to your computer and use it in GitHub Desktop.
Dask task graph handling costs on the client
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Example from "Doing Nothing Poorly: Accelerating the Dask Scheduler" workshop | |
# Dask Summit 2021 | |
# Task graph handling costs on the client | |
import pickle | |
import dask | |
from dask.datasets import timeseries | |
# Create dask task graph | |
%time ddf = timeseries().shuffle("id", shuffle="tasks").head(compute=False) | |
# Wall time: 4.01 s | |
# Optimize | |
%time ddf_opt, = dask.optimize(ddf) | |
# Wall time: 1.3 s | |
# Serialize | |
byte_total = 0 | |
for k, v in ddf_opt.__dask_graph__().items(): | |
byte_total += len(pickle.dumps(k)) + len(pickle.dumps(v)) | |
# Wall time: 731 ms | |
# Send to the scheduler | |
dask.utils.format_bytes(byte_total) | |
# '15.88 MB' (Assume ~587 ms at 100MB/s) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment