lxdlam lxdlam

Each of these commands will run an ad hoc http static server in your current (or specified) directory, available at http://localhost:8000. Use this power wisely.

Discussion on reddit.

Python 2.x

$ python -m SimpleHTTPServer 8000

Writing a modern rendering engine

Designing a Modern GPU Interface by @BrookeHodgman
Optimizing the Graphics Pipeline with Compute by @gwihlidal
GPU Driven Rendering Pipelines by @SebAaltonen
Destiny’s Multi-threaded Renderer Architecture by @Mirror2Mask
Stingray Renderer Walkthrough by @tobias_persson

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD

	#pragma once

	#include <exception>
	#include <memory>
	#include <typeinfo>
	#include <type_traits>

	class any;

	template<class Type> Type any_cast(any&);