apnea

driver / toolkit in prod

Driver 565.77 CUDA 12.7

(currently Aug 2025 running on 22.04 Linux pop-os 6.12.10-76061203-generic)

CUDA

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#deprecated-architectures

CUDA Toolkit 12.9 Update 1 - Release Notes

Maxwell, Pascal, and Volta architectures are now feature-complete with no further enhancements planned. While CUDA Toolkit 12.x series will continue to support building applications for these architectures, offline compilation and library support will be removed in the next major CUDA Toolkit version release. Users should plan migration to newer architectures, as future toolkits will be unable to target Maxwell, Pascal, and Volta GPUs.

Caveman Token Cost Analysis

Date: 2026-04-18

Analysis of caveman — a skill/plugin instructing AI coding agents to respond in compressed prose, dropping articles, filler, and hedging while preserving technical accuracy.

Summary

Caveman claims: "~65-75% fewer tokens," measured via an eval harness that counts visible output tokens.

Spontaneous Language Switching in LLMs

LLMs may spontaneously switch to Chinese mid-reasoning regardless of prompt language — observed in both OpenAI's o1 and Chinese models (DeepSeek, Qwen, GLM)

The papers listed below list 3 possible reasons for this: internal circuit competition, strategic reasoning advantages gained during training, and the influence of distributed training data.

1. Competition Between Internal Circuits

Mechanistic interpretability research suggests that multilingual LLMs possess two distinct internal subsystems that govern generation:

Z.AI Coding Plan — OpenCode Agent Mapping

Quota Cost per Model

Peak hours: 14:00–18:00 UTC+8. Off-peak: all other times. Monthly quota is equivalent to ~15–30× the subscription fee, converted at API pricing rates.

Model	Quota (Peak: 14:00-18:00 UTC+8)	Quota (Off-Peak)	Temporary (thru June)
GLM-5.1	3×	2×	1× off-peak
GLM-5-Turbo	3×	2×	1× off-peak

GLM-5/5.1 System Prompt Research & Design

Date: 2026-05-12 Purpose: Design an optimal system prompt for GLM-5.1 in opencode (a coding agent CLI), informed by the GLM-5 paper, Z.AI docs, and community findings.

Sources

| Source | Key Takeaway |

Vera vs opencode-codebase-index vs AFT: Feature & Metrics Comparison

Vera (v0.7.0) — Local-first semantic code search CLI in pure Rust. 65 languages, ONNX embeddings + cross-encoder reranker, fully offline.

opencode-codebase-index (v0.8.0) — Semantic codebase indexing plugin for OpenCode, also runs as standalone MCP server. Hybrid TypeScript + Rust, API-first embeddings, 17+ languages.

AFT (v0.26.4) — Agent File Toolkit. Tree-sitter powered code manipulation and analysis for AI agents. Rust binary + thin TS plugins. 17 languages. Semantic search + trigram grep + structural editing + call-graph navigation + LSP diagnostics.

Beyond Text: The Spectrum of Code Representation for LLM Coding Agents

An analysis of how code can be represented to LLMs — from raw text to architectural patterns — and where the research frontier currently sits.

The Spectrum

Code can be represented to LLMs at progressively richer levels of abstraction. Each level captures more structural and semantic information, but also requires more sophisticated tooling and domain knowledge to construct.

The Governance Spectrum Scaffold

A Framework for Comparing AI Governance Across Model Risk and Agent Risk


Version	2.0
Date	20 May 2026
Critique cycles	2 (3 independent reviewers per cycle)
Source documents	19 (5 regulations, 6 practitioner/academic papers, 5 macro-prudential and cross-sector frameworks, 2 US fair lending guidance, 1 government agentic AI framework, 1 EU implementing guidance)

	#!/bin/bash
	python3 -c "
	import os, urllib.parse
	files = [f for f in os.listdir('.') if os.path.isfile(f)]
	files.sort(key=lambda x: os.path.getmtime(x), reverse=True)
	with open('index.html', 'w') as out:
	out.write('<html><body><h2>Files (newest first)</h2>')
	for f in files:
	url = urllib.parse.quote(f)
	out.write(f'<a href=\"{url}\">{f}</a><br>\n')

	# Lattice Plugin Release Workflow

	## Overview

	Publishing a new version of `@apnea/opencode-lattice` involves three steps:
	bump the version in package.json, push a git tag, and let CI handle the rest.

	The GitHub Action (`.github/workflows/publish-plugin.yml`) automatically:
	1. Publishes to npm when a `v*` tag is pushed
	2. Creates a GitHub release with auto-generated notes