This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from collections import namedtuple | |
from torch.utils.data import Dataset | |
Tokens = namedtuple("Tokens", ["input_ids", "attention_mask"]) | |
class TokensDataset(Dataset): | |
def __init__(self, iids, amask): | |
self.input_ids = iids.to_numpy() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
spark-submit \ | |
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \ | |
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG="hdfs:///user/hadoop/config.json" \ | |
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=${YOUR_DOCKER_IMAGE} \ | |
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \ | |
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG="hdfs:///user/hadoop/config.json" \ | |
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=${YOUR_DOCKER_IMAGE} \ | |
s3://your-bucket/path/to/your/script.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Exp. Name | Instance Type | Instance Count | Instance Memory | Instance Cores | Machine Cost (h) | Spot Price (As Today) | Worker Memory | Worker Cores | Worker Count | Batch Size (Rows) | Total Rows | Job Time (min) | On Demand Price | Spot Price | Price/ 1000 Rows | On demand Delta with Prod | Spot Delta Current Prod | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Prod Spark | c5d.4xlarge | 26 | 32 | 16 | $0.8880 | $0.3233 | 13 | 2 | 64 | 250 | 83957 | 29 | $11.1592 | $4.0632 | $0.1329 | - | - | |
Prod Dask | r5d.4xlarge | 10 | 128 | 16 | $1.3840 | $0.3254 | 16 | 2 | 80 | 150 | 83957 | 29 | $6.6893 | $1.5729 | $0.0797 | 40.00% | 61.00% |
OlderNewer