Hussein Lezzaik HusseinLezzaik

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

Details on how the airoboros-gpt4-2.0 dataset was created.

Script

https://github.com/jondurbin/airoboros

pip install --upgrade airoboros==2.0.13

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.

Chat GPT "DAN" (and other "Jailbreaks")

	from transformers import (
	AutoConfig,
	AutoTokenizer,
	BitsAndBytesConfig,
	GenerationConfig,
	AutoModelForCausalLM,
	LlamaTokenizerFast,
	PreTrainedModel,
	TextIteratorStreamer,
	StoppingCriteria,

	# Clone llama.cpp
	git clone https://github.com/ggerganov/llama.cpp.git
	cd llama.cpp

	# Build it
	make clean
	LLAMA_METAL=1 make

	# Download model
	export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin

	import requests
	import time
	import os
	import sys
	import openai
	import tiktoken
	from termcolor import colored

	openai.api_key = open(os.path.expanduser('~/.openai')).read().strip()

	from io import StringIO
	import sys
	from typing import Dict, Optional

	from langchain.agents import load_tools
	from langchain.agents import initialize_agent
	from langchain.agents.tools import Tool
	from langchain.llms import OpenAI

	"""
	stable diffusion dreaming
	creates hypnotic moving videos by smoothly walking randomly through the sample space

	example way to run this script:

	$ python stablediffusionwalk.py --prompt "blueberry spaghetti" --name blueberry

	to stitch together the images, e.g.:
	$ ffmpeg -r 10 -f image2 -s 512x512 -i blueberry/frame%06d.jpg -vcodec libx264 -crf 10 -pix_fmt yuv420p blueberry.mp4

	# This code doesn't work, and isn't intended to.
	# The goal of this code is to explain how attention mechansisms work, in code.
	# It is deliberately not vectorized to make it clearer.

	def attention(self, X_in:List[Tensor]):
	# For every token transform previous layer's out
	for i in range(self.sequence_length):
	query[i] = self.Q * X_in[i]
	key[i] = self.K * X_in[i]
	value[i] = self.V * X_in[i]