weijh imweijh

Gemini CLI Plan Mode

You are Gemini CLI, an expert AI assistant operating in a special 'Plan Mode'. Your sole purpose is to research, analyze, and create detailed implementation plans. You must operate in a strict read-only capacity.

Gemini CLI's primary goal is to act like a senior engineer: understand the request, investigate the codebase and relevant resources, formulate a robust strategy, and then present a clear, step-by-step plan for approval. You are forbidden from making any modifications. You are also forbidden from implementing the plan.

Core Principles of Plan Mode

Strictly Read-Only: You can inspect files, navigate code repositories, evaluate project structure, search the web, and examine documentation.
Absolutely No Modifications: You are prohibited from performing any action that alters the state of the system. This includes:

description

tools

4.1 Beast Mode

changes

codebase

editFiles

extensions

fetch

findTestFiles

githubRepo

new

openSimpleBrowser

problems

readCellOutput

runCommands

runNotebooks

runTasks

runTests

search

searchResults

terminalLastCommand

terminalSelection

testFailure

updateUserPreferences

usages

vscodeAPI

IMPORTANT: There is currently a bug in VS Code Insiders where if you do not have the tools specified in the frontmatter, the mode will have access to no tools and will do nothing. For now, I am including the built-in tools in the front matter. I will update this when the issue is resolved as including tools EXCLUDES tools you don't specify but might need like MCP servers.

You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user.

tl;dr;

UPDATE Mon Mar 10 10:51:31 AM EDT 2025 Check out the newer ktransformers guide for how to get it running faster! About 3.5 tok/sec on this same gaming rig. Big thanks to Supreeth Koundinya with analyticsindiamag.com for the article!

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most u

FAQ on the xz-utils backdoor (CVE-2024-3094)

This is a living document. Everything in this document is made in good faith of being accurate, but like I just said; we don't yet know everything about what's going on.

Update: I've disabled comments as of 2025-01-26 to avoid everyone having notifications for something a year on if someone wants to suggest a correction. Folks are free to email to suggest corrections still, of course.

Background

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

	FROM qwen3:30b-a3b-q8_0

	TEMPLATE """{{- if .Messages }}
	{{- if or .System .Tools }}<\|im_start\|>system
	{{- if .System }}
	{{ .System }}
	{{- end }}
	{{- if .Tools }}

	# Tools

	// Draw a triangle on Windows using OpenGL 1.1
	// $ gcc -mwindows -o triangle triangle.c -lopengl32
	// This is free and unencumbered software released into the public domain.
	#define WIN32_LEAN_AND_MEAN
	#include <windows.h>
	#include <GL/gl.h>

	#define countof(a) (int)(sizeof(a) / (sizeof(*(a))))

	static LRESULT CALLBACK handler(HWND h, UINT msg, WPARAM wparam, LPARAM lparam)

	// $ cc -o persona persona.c
	// $ ./persona <test.txt
	// Ref: https://old.reddit.com/r/C_Programming/comments/1bmfb7p
	#include <stddef.h>
	#include <stdint.h>
	#include <string.h>

	#define assert(c) while (!(c)) (volatile int )0 = 0
	#define countof(a) (ptrdiff_t)(sizeof(a) / sizeof(*(a)))
	#define new(a, t, n) (t *)alloc(a, sizeof(t), _Alignof(t), n)

	// Font rendering demo
	// $ cc -o demo demo.c $(sdl2-config --cflags --libs)
	// Ref: https://old.reddit.com/r/C_Programming/comments/13ga82a
	// This is free and unencumbered software released into the public domain.
	#include "SDL.h"

	// https://itch.io/jam/lowrezjam2016/topic/19413/minimal-sprite-font-with-upperlower-cases-cleanreadable
	#define FONTW 72
	#define FONTH 143
	#define CHARW 9