Martin Gubri Framartin

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

Preparation of ImageNet (ILSVRC2012)

The dataset can be found on the official website if you are affiliated with a research organization. It is also available on Academic torrents.

This script extracts all the images and group them so that folders contain images that belong to the same class.

Download the ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar
Download the script wget https://gist.githubusercontent.com/antoinebrl/7d00d5cb6c95ef194c737392ef7e476a/raw/dc53ad5fcb69dcde2b3e0b9d6f8f99d000ead696/prepare.sh
Run it ./prepare.sh

These links no longer work. Springer have pulled the free plug.

Graduate texts in mathematics

duplicates = multiple editions

A Classical Introduction to Modern Number Theory, Kenneth Ireland Michael Rosen

	#!/bin/bash
	#
	# script to extract ImageNet dataset
	# ILSVRC2012_img_train.tar (about 138 GB)
	# ILSVRC2012_img_val.tar (about 6.3 GB)
	# make sure ILSVRC2012_img_train.tar & ILSVRC2012_img_val.tar in your current directory
	#
	# https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md
	#
	# train/

	REGIONS = {
	'Auvergne-Rhône-Alpes': ['01', '03', '07', '15', '26', '38', '42', '43', '63', '69', '73', '74'],
	'Bourgogne-Franche-Comté': ['21', '25', '39', '58', '70', '71', '89', '90'],
	'Bretagne': ['35', '22', '56', '29'],
	'Centre-Val de Loire': ['18', '28', '36', '37', '41', '45'],
	'Corse': ['2A', '2B'],
	'Grand Est': ['08', '10', '51', '52', '54', '55', '57', '67', '68', '88'],
	'Guadeloupe': ['971'],
	'Guyane': ['973'],
	'Hauts-de-France': ['02', '59', '60', '62', '80'],

	### DATA SECTION

	library(data.table)

	## Read data with 'data.table::fread'
	input <- fread("wu03ew_v1.csv", select = 1:3)
	setnames(input, 1:3, new = c("origin", "destination","total"))
	## Coordinates
	centroids <- fread("msoa_popweightedcentroids.csv")
	## 'Code' is the key to be used in the joins

	import theano
	import theano.tensor as T
	import numpy as np
	import cPickle
	import random
	import matplotlib.pyplot as plt

	class RNN(object):

	def __init__(self, nin, n_hidden, nout):

	var request = require('request');
	var unzip = require('unzip');
	var csv2 = require('csv2');

	request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip')
	.pipe(unzip.Parse())
	.on('entry', function (entry) {
	entry.pipe(csv2()).on('data', console.log);
	})
	;

	from PyInstaller.utils.hooks import collect_submodules, collect_data_files

	# This hooks the scrapy project 'cot' to import all submodules, change name to match scrapy project
	hiddenimports = (collect_submodules('cot'))

	CB_color_cycle = ['#377eb8', '#ff7f00', '#4daf4a',
	'#f781bf', '#a65628', '#984ea3',
	'#999999', '#e41a1c', '#dede00']