justheuristic justheuristic

A criticism of "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big"

Yoav Goldberg, Jan 23, 2021.

The FAccT paper "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big" by Bender, Gebru, McMillan-Major and Shmitchell has been the center of a controversary recently. The final version is now out, and, owing a lot to this controversary, would undoubtly become very widely read. I read an earlier draft of the paper, and I think that the new and updated final version is much improved in many ways: kudos for the authors for this upgrade. I also agree with and endorse most of the content. This is important stuff, you should read it.

However, I do find some aspects of the paper (and the resulting discourse around it and around technology) to be problematic. These weren't clear to me when initially reading the first draft several months ago, but they became very clear to me now. These points are for the most part

	import torch
	import torch.nn.functional as F

	def to_float8(x, dtype=torch.float8_e4m3fn):
	finfo = torch.finfo(dtype)
	# Calculate the scale as dtype max divided by absmax
	scale = finfo.max / x.abs().max().clamp(min=1e-12)
	# scale and clamp the tensor to bring it to
	# the representative range of float8 data type
	# (as default cast is unsaturated)

	import torch
	import bitsandbytes as bnb

	from heapq import heappush, heappop, heapify

	a = torch.normal(0, 0.5, size=(1024, 1024),device='cuda')

	def get_compression(x:torch.Tensor)->float:
	"""Yields the compression rate of Huffman Coding"""
	assert x.device.type == 'cuda'

	import torch
	import torch.autograd


	class MaskedSpMatmul(torch.autograd.Function):
	CHUNK_SIZE = 10000

	@staticmethod
	def forward(ctx, a, b, mask):
	"""

	#!/bin/bash

	# install CUDA Toolkit v8.0
	# instructions from https://developer.nvidia.com/cuda-downloads (linux -> x86_64 -> Ubuntu -> 16.04 -> deb (network))
	CUDA_REPO_PKG="cuda-repo-ubuntu1604_8.0.61-1_amd64.deb"
	wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/${CUDA_REPO_PKG}
	sudo dpkg -i ${CUDA_REPO_PKG}
	sudo apt-get update
	sudo apt-get -y install cuda

	"""UDP proxy server."""

	import asyncio


	class ProxyDatagramProtocol(asyncio.DatagramProtocol):

	def __init__(self, remote_address):
	self.remote_address = remote_address
	self.remotes = {}

	import numpy as np
	import theano
	import theano.tensor as T
	import lasagne.layers as L


	class LocallyConnected2DLayer(L.Conv2DLayer):
	"""Similar to Conv2DLayer except that the filter weights are unshared

	This implementation computes the output tensor by iterating over the filter