I tried the GPT-2 124M parameter model on 3 different GPU setups - following karpathy/llm.c#481. Here's how long each run it took and how much it cost:
1x A10G | 1x A100 40GB | 8x A100 40GB | |
---|---|---|---|
training time | 48h | 15h | 2h |
cost | $163 | $27 | $45 |
Most of the completions are fairly nonsensical, but here are some interesting ones: