Xiaofan Hu bom-d-van

How large are large language models? (2025)

This aims to be factual information about the size of large language models. None of this document was written by AI. I do not include any information from leaks or rumors. The focus of this document is on base models (the raw text continuation engines, not 'helpful chatbot/assistants'). This is a view from a few years ago to today of one very tiny fraction of the larger LLM story that's happening.

History

GPT-2,-medium,-large,-xl (2019): 137M, 380M, 812M, 1.61B. Source: openai-community/gpt2. Trained on the unreleased WebText dataset said to 40GB of Internet text - I estimate that to be roughly 10B tokens. You can see a list of the websites that went into that data set here domains.txt.
GPT-3 aka davinci, davinci-002 (2020): 175B parameters. There is a good breakdown of how those parameters are 'spent' here [How d

Network Device Management with container runtimes

This document provides a hands-on guide to understanding how runtimes interacts with network devices and namespaces, focusing on the new "Network Devices" feature described in the OCI (Open Container Initiative) runtime specification. The feature is expected to be released in the version 1.3.0 of the OCI specification.

In high-level container orchestration systems like Kubernetes, the management of network namespaces and interfaces is handled by the [Container Runtime Interface

How to create a patch for a repository with submodules

You checkout whatever repository you want.

Then you make all the changes that are required to make it all work in any submodule as needed. You don't commit in the submodules.

Make sure you have committed everything in your own repository and git diff shows that there are only changes in your submodules and you "parent" repository is still clean.

git --no-pager diff --no-color --submodule=diff &gt; patch.diff

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

"Don Knuth Plays with ChatGPT" but with ChatGPT-4

1. Tell me what Donald Knuth says to Stephen Wolfram about chatGPT.

I'm sorry for any confusion, but as an AI, I can't provide real-time updates or specific conversations between individuals beyond my last training cut-off in September 2021. I don't have access to the internet or any databases for real-time information or private communications. Therefore, I can't give you the details of a conversation between Donald Knuth and Stephen Wolfram about ChatGPT, if such a conversation has taken place after my training data.

However, it's worth noting that both Donald Knuth and Stephen Wolfram are significant figures in the fields of computer science and computational mathematics. Knuth is widely recognized for his multi-volume work "The Art of Computer Programming" and for creating the TeX computer typesetting system, while Wolfram is known for his work in developing Mathematica and Wolfram|Alpha, and for his contributions to cellular automata theory, outl

Training open-source LLMs on ChatGPT output is a really bad idea.

Everyone is now racing to create open-source alternatives to compete with GPT3.5/GPT4. A common shortcut used by some teams to bootstrap their effort is to fine-tune their model on ChatGPT output. I used to think it was a good idea and totally fair play to do this. Actually, I still think it’s fair play. OpenAI effectively distilled the entire web into its models. They are saying themself that they are using publicly accessible information (mostly). So distilling their model is, in effect, distilling the public open web, so small Term of Service details aside, I don’t see major ethical problems with that. Right? Well, it’s not entirely true and I realized now that, even when ignoring the ethical considerations, using their output is a really bad idea.

First of all, from a purely technical point of view, as @yoavgo is explaining it beautifully in his recent post, there is no way to align LLMs correctly without the RLHF component. I encourag

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

	# DuckDB Vector Tiles with mapgl Example
	# This script demonstrates how to serve vector tiles from DuckDB using ST_AsMVT()
	# and display them in a mapgl map using httpuv

	library(mapgl)
	library(duckdb) # Requires the latest DuckDB version (>= 1.4.0)
	library(httpuv)
	library(sf)
	library(duckspatial)
	library(tigris)

	// defer.h
	// [email protected]
	// © 2025 Crash Override, Inc.
	// Licensed under the BSD 3-Clause license

	#pragma once
	#include <stdint.h>

	typedef struct n00b_defer_ll_t n00b_defer_ll_t;

	#!/usr/bin/env -S bash -c "docker run -p 8080:8080 -it --rm \$(docker build --progress plain -f \$0 . 2>&1 \| tee /dev/stderr \| grep -oP 'sha256:[0-9a-f]*')"

	# syntax = docker/dockerfile:1.4.0

	FROM node:20

	WORKDIR /root

	RUN npm install sqlite3