Skip to content

Instantly share code, notes, and snippets.

@pablosjv
Created September 1, 2021 09:54
Show Gist options
  • Save pablosjv/7b7d1068dc97a1d547515bfc9e2e0df9 to your computer and use it in GitHub Desktop.
Save pablosjv/7b7d1068dc97a1d547515bfc9e2e0df9 to your computer and use it in GitHub Desktop.
Large Scale Pytorch Inference Pipeline: Spark vs Dask - Tables
Exp. Name Instance Type Instance Count Instance Memory Instance Cores Machine Cost (h) Spot Price (As Today) Worker Memory Worker Cores Worker Count Batch Size (Rows) Total Rows Job Time (min) On Demand Price Spot Price Price/ 1000 Rows On demand Delta with Prod Spot Delta Current Prod
Prod Spark c5d.4xlarge 26 32 16 $0.8880 $0.3233 13 2 64 250 83957 29 $11.1592 $4.0632 $0.1329 - -
Prod Dask r5d.4xlarge 10 128 16 $1.3840 $0.3254 16 2 80 150 83957 29 $6.6893 $1.5729 $0.0797 40.00% 61.00%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment