choltha

Default

Powerlevel10k

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

Fast Discrete Program Search 2

I am investigating how to use Bend (a parallel language) to accelerate Symbolic AI; in special, Discrete Program Search. Basically, think of it as an alternative to LLMs, GPTs, NNs, that is also capable of generating code, but by entirely different means. This kind of approach was never scaled with mass compute before - it wasn't possible! - but Bend changes this. So, my idea was to do it, and see where it goes.

Now, while I was implementing some candidate algorithms on Bend, I realized that, rather than mass parallelism, I could use an entirely different mechanism to speed things up: SUP Nodes. Basically, it is a feature that Bend inherited from its underlying model ("Interaction Combinators") that, in simple terms, allows us to combine multiple functions into a single superposed one, and apply them all to an argument "at the same time". In short, it allows us to call N functions at a fraction of the expected cost. Or, in simple terms: why parallelize when we can share?

A

	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD

	# Taken from https://johanwind.github.io/2023/03/23/rwkv_details.html.
	# I've added additional comments restructured it a tiny bit, which makes it clearer for me.

	import numpy as np
	from torch import load as torch_load # Only for loading the model weights
	from tokenizers import Tokenizer

	exp = np.exp
	layer_norm = lambda x, w, b : (x - np.mean(x)) / np.std(x) * w + b
	sigmoid = lambda x : 1/(1 + exp(-x))

	import asyncio
	import base64
	import json
	import os
	import pyaudio
	from websockets.asyncio.client import connect


	class SimpleGeminiVoice:
	def __init__(self):

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	import re
	import torch
	from datasets import load_dataset, Dataset
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import LoraConfig
	from trl import GRPOConfig, GRPOTrainer