Created
September 1, 2021 09:54
-
-
Save pablosjv/7b7d1068dc97a1d547515bfc9e2e0df9 to your computer and use it in GitHub Desktop.
Large Scale Pytorch Inference Pipeline: Spark vs Dask - Tables
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Exp. Name | Instance Type | Instance Count | Instance Memory | Instance Cores | Machine Cost (h) | Spot Price (As Today) | Worker Memory | Worker Cores | Worker Count | Batch Size (Rows) | Total Rows | Job Time (min) | On Demand Price | Spot Price | Price/ 1000 Rows | On demand Delta with Prod | Spot Delta Current Prod | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Prod Spark | c5d.4xlarge | 26 | 32 | 16 | $0.8880 | $0.3233 | 13 | 2 | 64 | 250 | 83957 | 29 | $11.1592 | $4.0632 | $0.1329 | - | - | |
Prod Dask | r5d.4xlarge | 10 | 128 | 16 | $1.3840 | $0.3254 | 16 | 2 | 80 | 150 | 83957 | 29 | $6.6893 | $1.5729 | $0.0797 | 40.00% | 61.00% |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment