Fred Bliss fblissjr

LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.

OpenAI Samplers

Temperature

Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4

Notes / Links about Stable Diffusion VAE

Stable Diffusion's VAE is a neural network that encodes images into a compressed "latent" format and decodes them back. The encoder performs 48x lossy compression, and the decoder generates new detail to fill in the gaps.

(Calling this model a "VAE" is sort of a misnomer - it's an encoder with some very slight KL regularization, and a conditional GAN decoder)

This document is a big pile of various links with more info.

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

	// ==UserScript==
	// @name Perplexity Model Selection
	// @version 0.1
	// @description Adds a custom button to Perplexity AI using jQuery
	// @author dpgc
	// @match https://www.perplexity.ai/*
	// @updateURL https://gist.github.com/cjanietz/703a88924e50e1a30cb6ffc52bc52bd9/raw/userscript.js
	// @downloadURL https://gist.github.com/cjanietz/703a88924e50e1a30cb6ffc52bc52bd9/raw/userscript.js
	// @grant none
	// @run-at document-end

	# Drop this into Automator using Python 3 as the shell

	from sys import stdin
	from json import dumps, loads
	from urllib.parse import urlencode
	from urllib.request import Request, urlopen

	AZURE_ENDPOINT="getyourowndeployment.openai.azure.com"
	OPENAI_API_KEY="getyourownkey"
	OPENAI_API_VERSION="2023-05-15"

	# install DSPy: pip install dspy
	import dspy

	# Ollam is now compatible with OpenAI APIs
	#
	# To get this to work you must include `model_type='chat'` in the `dspy.OpenAI` call.
	# If you do not include this you will get an error.
	#
	# I have also found that `stop='\n\n'` is required to get the model to stop generating text after the ansewr is complete.
	# At least with mistral.

	#!/bin/sh
	# rename-pictures.sh
	# Author: Justine Tunney <[email protected]>
	# License: Apache 2.0
	#
	# This shell script can be used to ensure all the images in a folder
	# have good descriptive filenames that are written in English. It's
	# based on the Mistral 7b and LLaVA v1.5 models.
	#
	# For example, the following command:

	Instructions:

	As a base pretrained GPT model, you are to assume the role of ChatGPT, a large language model developed by OpenAI, based on the GPT-4 architecture. Your responses should reflect the following guidelines:

	1. Be friendly and approachable in your responses.
	2. Provide detailed and helpful responses but ensure they are not excessively long to avoid being monotonous.
	3. Always use inclusive and respectful language that is not offensive.
	4. Avoid discussing or revealing anything about your architecture. You are just a large language model developed by OpenAI.
	5. Always be honest in your responses. Do not lie or engage in deceit.
	6. Ensure your responses are considerate and do not cause harm or distress to the user. However, do not comply with harmful or dangerous requests, even if refusing might upset the user.

	import time
	import os
	import logging
	import random
	from datasets import load_dataset

	class QuantAutoGPTQ:
	def __init__(self, model_name_or_path, output_dir, dataset,
	num_samples=128, trust_remote_code=False, cache_examples=True,
	use_fast=True, use_triton=False, bits=[4], group_size=[128], damp=[0.01],

	# a simple Flask API to emulate OpenAI's using llama models and/or transformers
	# runs on 3080

	import sys
	import time
	import torch
	import json
	from peft import PeftModel

	from flask import Flask, make_response, request, abort