Nelson Elhage nelhage

My approach:

Let's use "cost" as a proxy for CO2 emissions, because within an order of magnitude I expect $1 on gas emits as much CO2 as $1 of power for a server, and I believe the bulk of the ongoing cost of operating a server is power budget.
Lifetime cost of a gas for a car: Let's estimate 100,000 miles, and 20 mpg and $4/gal. That gives us 5,000 gal and $20,000 in lifetime gas cost for a car. These numbers all feel pessimistic so we're maybe high.
A beefy server in AWS or GCP costs something like $1/hr, probably more for the really big ones. So in order to get to $20,000/run, we need 20,000 instance-hours. Let's ballpark a large training run at 1 day, so that's ~800 instances for one day. That feels within the realm of the numbers I've seen in papers.

Conclusion: Entirely plausible. Other conclusion: Lifetime cost of gasoline for a car is (in round numbers) the same as the cost of a new car. I hadn't realized they were so close. Other other conclusion: Now I understand why OpenAI needs billions o

	│ intrinsics::move_val_init(&mut *dst, src) ▒
	0.90 │ lea (%rsi,%rsi,4),%rcx ▒
	0.05 │ shl $0x4,%rcx ▒
	0.11 │ mov 0xb8(%rsp),%rdx ▒
	0.80 │ mov %rdx,(%rax,%rcx,1) ▒
	1.66 │ movaps 0xd0(%rsp),%xmm0 ▒
	11.08 │ movups %xmm0,0x8(%rax,%rcx,1)

	[package]
	name = "f"
	version = "0.1.0"
	authors = ["Alex Gaynor <[email protected]>"]
	edition = "2018"

	# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

	[dependencies]
	packed_simd = "0.3"

	[nelhage@monolithique:~/code/alexbench]$ clang-11 -O1 -S -c -emit-llvm overflow_2.c
	[nelhage@monolithique:~/code/alexbench]$ opt-11 --opt-bisect-limit=137 -O2 -S overflow_2.ll 2>/dev/null \| opt-11 -O2 -S -o overflow_2_opt.ll
	[nelhage@monolithique:~/code/alexbench]$ cat overflow_2_opt.ll
	; ModuleID = '<stdin>'
	source_filename = "overflow_2.c"
	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-pc-linux-gnu"

	; Function Attrs: nofree nounwind uwtable
	define dso_local { i8*, i64 } @f1_overflow() local_unnamed_addr #0 {

	b1:
	%x1 = 1
	jmp b3
	b2:
	%x2 = 2
	jmp b3
	b3:
	%x3 = phi [b1, %x1], [b2, %x2]

	# BoringSSL build times, Pixelbook
	# `make -j5`, local compiler
	real 2m40.180s
	user 8m37.189s
	sys 1m21.565s

	# `ninja -j6`, local
	real 2m35.987s
	user 7m53.337s
	sys 1m14.439s

	{
	"Parameters": {
	"ObjectStoreBucket": {
	"Type": "String",
	"Description": "A pre-existing S3 bucket to use for llama's object store"
	},
	"ObjectStorePrefix": {
	"Type": "String",
	"Description": "A prefix in $ObjectStoreBucket under which to store objects",
	"Default": "/",

	#!/usr/bin/env python
	import os
	import time
	import torch
	import torch.distributed as dist
	import torch.multiprocessing as mp

	INTERVAL = 1
	COMM_SIZE = (10,)

	package main

	var x interface{ f() }

	type A struct {
	fp func()
	}

	func (a A) f() { a.fp() }