Skip to content

Instantly share code, notes, and snippets.

My approach:

  • Let's use "cost" as a proxy for CO2 emissions, because within an order of magnitude I expect $1 on gas emits as much CO2 as $1 of power for a server, and I believe the bulk of the ongoing cost of operating a server is power budget.
  • Lifetime cost of a gas for a car: Let's estimate 100,000 miles, and 20 mpg and $4/gal. That gives us 5,000 gal and $20,000 in lifetime gas cost for a car. These numbers all feel pessimistic so we're maybe high.
  • A beefy server in AWS or GCP costs something like $1/hr, probably more for the really big ones. So in order to get to $20,000/run, we need 20,000 instance-hours. Let's ballpark a large training run at 1 day, so that's ~800 instances for one day. That feels within the realm of the numbers I've seen in papers.

Conclusion: Entirely plausible. Other conclusion: Lifetime cost of gasoline for a car is (in round numbers) the same as the cost of a new car. I hadn't realized they were so close. Other other conclusion: Now I understand why OpenAI needs billions o

│ intrinsics::move_val_init(&mut *dst, src) ▒
0.90 │ lea (%rsi,%rsi,4),%rcx ▒
0.05 │ shl $0x4,%rcx ▒
0.11 │ mov 0xb8(%rsp),%rdx ▒
0.80 │ mov %rdx,(%rax,%rcx,1) ▒
1.66 │ movaps 0xd0(%rsp),%xmm0 ▒
11.08 │ movups %xmm0,0x8(%rax,%rcx,1)
@nelhage
nelhage / Cargo.toml
Last active May 4, 2020 03:19 — forked from alex/Cargo.toml
vectorized contains4 implementation in rust
[package]
name = "f"
version = "0.1.0"
authors = ["Alex Gaynor <[email protected]>"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
packed_simd = "0.3"
[nelhage@monolithique:~/code/alexbench]$ clang-11 -O1 -S -c -emit-llvm overflow_2.c
[nelhage@monolithique:~/code/alexbench]$ opt-11 --opt-bisect-limit=137 -O2 -S overflow_2.ll 2>/dev/null | opt-11 -O2 -S -o overflow_2_opt.ll
[nelhage@monolithique:~/code/alexbench]$ cat overflow_2_opt.ll
; ModuleID = '<stdin>'
source_filename = "overflow_2.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
; Function Attrs: nofree nounwind uwtable
define dso_local { i8*, i64 } @f1_overflow() local_unnamed_addr #0 {
@nelhage
nelhage / .gitignore
Last active May 26, 2020 07:04
cuviper/probe bug report
/target
b1:
%x1 = 1
jmp b3
b2:
%x2 = 2
jmp b3
b3:
%x3 = phi [b1, %x1], [b2, %x2]
# BoringSSL build times, Pixelbook
# `make -j5`, local compiler
real 2m40.180s
user 8m37.189s
sys 1m21.565s
# `ninja -j6`, local
real 2m35.987s
user 7m53.337s
sys 1m14.439s
@nelhage
nelhage / llama.json
Last active January 3, 2021 18:34
Llama CF template
{
"Parameters": {
"ObjectStoreBucket": {
"Type": "String",
"Description": "A pre-existing S3 bucket to use for llama's object store"
},
"ObjectStorePrefix": {
"Type": "String",
"Description": "A prefix in $ObjectStoreBucket under which to store objects",
"Default": "/",
#!/usr/bin/env python
import os
import time
import torch
import torch.distributed as dist
import torch.multiprocessing as mp
INTERVAL = 1
COMM_SIZE = (10,)
package main
var x interface{ f() }
type A struct {
fp func()
}
func (a A) f() { a.fp() }