Garrett Mooney GarrettMooney

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

See rune2e.sh for info on how to run the experiment.

Generating Synthetic Data for LLM Evaluation

Summary

Use your application extensively to build intuition about failure modes
Define 3-4 dimensions based on observed or anticipated failures
Create structured tuples covering your priority failure scenarios
Generate natural language queries from each tuple using a separate LLM call
Scale to more examples across your most important failure hypotheses (we suggest at least ~100)
Test and iterate on the most critical failure modes first, and generate more until you reach theoretical saturation

	gpu_info = !nvidia-smi
	gpu_info = '\n'.join(gpu_info)
	if gpu_info.find('failed') >= 0:
	print('Not connected to a GPU')
	else:
	print(gpu_info)

	# make sure you have `tac` [1] (if on on macOS) and `atuin` [2] installed, then drop the below in your ~/.zshrc
	#
	# [1]: https://unix.stackexchange.com/questions/114041/how-can-i-get-the-tac-command-on-os-x
	# [2]: https://github.com/ellie/atuin

	atuin-setup() {
	! hash atuin && return
	bindkey '^E' _atuin_search_widget

	export ATUIN_NOBIND="true"