Andrew Spott spott

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

Development environment with Nix Flakes and Direnv

This document is targeted at those who seek to build reproducible dev environment across machines, OS, and time.

It maybe easier for remote teams to work together and not spending hours each person setting up asdf/pyenv/rbenv, LSP servers, linters, runtime/libs. Nix is probably the closest thing to Docker in terms of development environment.

Flake is used here because it provides hermetic build, with absolutely no reliance on system environment (be it Arch/Catalina/Mojave). Also it freezes dependencies in flake.lock so builds are reproducible.

This gist provides the setup to develop Java/Clojure/Python applications on Nix. But it can be easily adapted to ruby, nodejs, haskell.

Proxmox VE 6.x release includes a feature to add custom cloud-init configs. Unfortunately there is poor documentation, so I had to figure this out by adding pieces of information together.

The custom cloud-init files (user-data, meta-data, network-config)

The cloud-init files need to be stored in a snippet. This is not very well documented:

Go to Storage View -> Storage -> Add -> Directory
Give it an ID such as snippets, and specify any path on your host such as /snippets
Under Content choose Snippets and de-select Disk image (optional)
Upload (scp/rsync/whatever) your user-data, meta-data, network-config files to your proxmox server in /snippets/snippets/ (the directory should be there if you followed steps 1-3)

Autologin a user at the console (NixOS)

Problem

NixOS provide services.mingetty.autologinUser option, but it can’t be used for the specific TTY.

Minimal working example

TTY1 and user “guest”:

Troubleshooting Convolutional Neural Networks

Intro

This is a list of hacks gathered primarily from prior experiences as well as online sources (most notably Stanford's CS231n course notes) on how to troubleshoot the performance of a convolutional neural network . We will focus mainly on supervised learning using deep neural networks. While this guide assumes the user is coding in Python3.6 using tensorflow (TF), it can still be helpful as a language agnostic guide.

Suppose we are given a convolutional neural network to train and evaluate and assume the evaluation results are worse than expected. The following are steps to troubleshoot and potentially improve performance. The first section corresponds to must-do's and generally good practices before you start troubleshooting. Every subsequent section header corresponds to a problem and the section is devoted to solving it. The sections are ordered to reflect "more common" issues first and under each header the "most-eas

async/await is just the do-notation of the Promise monad

CertSimple just wrote a blog post arguing ES2017's async/await was the best thing to happen with JavaScript. I wholeheartedly agree.

In short, one of the (few?) good things about JavaScript used to be how well it handled asynchronous requests. This was mostly thanks to its Scheme-inherited implementation of functions and closures. That, though, was also one of its worst faults, because it led to the "callback hell", an seemingly unavoidable pattern that made highly asynchronous JS code almost unreadable. Many solutions attempted to solve that, but most failed. Promises almost did it, but failed too. Finally, async/await is here and, combined with Promises, it solves the problem for good. On this post, I'll explain why that is the case and trace a link between promises, async/await, the do-notation and monads.

First, let's illustrate the 3 styles by implementing

A Tour of PyTorch Internals (Part I)

The fundamental unit in PyTorch is the Tensor. This post will serve as an overview for how we implement Tensors in PyTorch, such that the user can interact with it from the Python shell. In particular, we want to answer four main questions:

How does PyTorch extend the Python interpreter to define a Tensor type that can be manipulated from Python code?
How does PyTorch wrap the C libraries that actually define the Tensor's properties and methods?
How does PyTorch cwrap work to generate code for Tensor methods?
How does PyTorch's build system take all of these components to compile and generate a workable application?

Extending the Python Interpreter

PyTorch defines a new package torch. In this post we will consider the ._C module. This module is known as an "extension module" - a Python module written in C. Such modules allow us to define new built-in object types (e.g. the Tensor) and to call C/C++ functions.


	from typing import List, Tuple, Optional, Union, Any, ContextManager, Callable, overload

	import builtins
	import math
	import pickle


	class dtype: ...
	_dtype = dtype

	import random

	import torch
	from torch.autograd import Variable
	import torch.nn as nn
	import torch.nn.functional as F
	import torch.optim as optim
	import torchvision
	from torchvision import datasets, transforms

	import datetime
	import linecache
	import os

	import pynvml3
	import torch

	print_tensor_sizes = True
	last_tensor_sizes = set()
	gpu_profile_fn = f'{datetime.datetime.now():%d-%b-%y-%H:%M:%S}-gpu_mem_prof.txt'