Skip to content

Instantly share code, notes, and snippets.

@louspringer
louspringer / microsoft_contact_intake_fragmentation_evidence.md
Created March 31, 2026 15:55
Microsoft 365 Contact Intake Fragmentation: Evidence From contact@energration.com

Microsoft 365 Contact Intake Fragmentation: Evidence From contact@energration.com

Summary

This is a concrete example of Microsoft 365 product fragmentation observed while trying to implement a simple business requirement.

  • One public inbound address: contact@energration.com
  • Delivery to a collaboration surface in Microsoft 365 / Teams
  • Usable by operators in Outlook and Microsoft 365
@louspringer
louspringer / llm_eudorus_context_prompt_run.md
Last active March 27, 2026 17:09
Spark LLM — GTS alliance (eudorus AGENTS.md + system; live run)

System

AGENTS.md (Energration eudorus repository root) — follow this document for how to use the eudorus repo (agents, tools, GitHub policy, Kiro/Codex workflow, steering, ontology, credentials). Verbatim copy below.

Discovered Agents & Tools

Hand-maintained. Full list of agent guidance artifacts and how they are generated or maintained: docs/agent_guidance_inventory.md. Run make agent-guidance-inventory to display it. Quick-start operator contract: docs/how_to_work_with_eudorus_codex.md (one-page execution playbook).

Workspace: /Volumes/lemon/gemini

@louspringer
louspringer / requirements.md
Created March 17, 2026 21:32
Kamizawa footgun spec: page cache and false OOM requirements

Requirements: Kamizawa footgun (page cache and false OOM)

Introduction

On unified-memory systems (e.g. DGX Spark, UMA) or when repeatedly swapping models (7B ↔ 120B), Linux page cache can be reported as "used" memory by PyTorch/Ray/vLLM and similar runtimes. That causes false OOM or "free memory on device is less than desired" on startup even though the memory is reclaimable. This spec captures the mitigation as a first-class feature: drop page cache before model launch in the deploy flow so the runtime sees accurate free memory.

Requirements

Requirement 1: Drop page cache before model start in deploy flow

@louspringer
louspringer / BENCHMARK_120B.md
Created March 17, 2026 21:26
120B benchmark: tokens/s, interpretation, Prometheus correlation, charts (gx10 Qwen 122B)

120B benchmark: tokens/s and interpretation

Endpoint: gx10 120B (Qwen 122B) at http://gx10-83fb.tail3dac72.ts.net:8002
Script: benchmark_120b_tokens_per_second.py


Quick reference

| What we measure | How |

@louspringer
louspringer / benchmark_120b_tokens_per_second.py
Created March 17, 2026 21:25
120B benchmark script: tokens/s, TTFT, full-spread (gx10)
#!/usr/bin/env python3
"""
Benchmark tokens per second and TTFT for the gx10 120B LLM endpoint (Qwen 122B).
Calls POST /v1/chat/completions (non-streaming for throughput; optional streaming
for TTFT). Supports multiple runs (mean ± std) and optional concurrent requests.
Usage:
python3 scripts/benchmark_120b_tokens_per_second.py [BASE_URL]
python3 scripts/benchmark_120b_tokens_per_second.py --runs 5 --ttft --concurrent 2
@louspringer
louspringer / TELEMETRY.md
Created March 17, 2026 21:25
GX10-83FB Telemetry (Prometheus + Grafana)

GX10-83FB Telemetry (Prometheus + Grafana)

Telemetry from the GX10-83FB host is exported to Prometheus and visualized in Grafana. Prometheus and Grafana run in the eudorus observatory stack in Docker on Zane. Access from any machine on the network (GX10, vonnegut, etc.) must use Zane’s Tailscale hostname, not localhost.

Observatory on Zane (Docker + Nginx)

The observatory stack runs in Docker on Zane. Nginx is the router in front; deployment path on Zane:
/Users/lou/migration/rootfs/home/lou/observatory-deployment
Config: observatory/nginx/nginx.conf; compose: docker-compose.yml.

@louspringer
louspringer / GOOSE_LLM_GX10_ACCESS.md
Created March 17, 2026 21:25
Goose: accessing the LLM on gx10

Goose configuration: accessing the LLM on gx10

Interrogation date: 2026-03-13

How to check status and whether the model is coming up: See LLM_STATUS_AND_HEALTH.md.

Goose config locations

File Purpose
@louspringer
louspringer / 120B_SERVE_RUNBOOK.md
Created March 17, 2026 21:25
120B model serve runbook (gx10-83fb)

120B model serve runbook (gx10-83fb)

Host: gx10-83fb (Tailscale: gx10-83fb.tail3dac72.ts.net)
Port for 120B: 8002 (single active 120B service at a time)
Recommended stack: Qwen3.5 122B A10B + llama.cpp (Option D)

Authoritative artifacts to read first: docs/GX10_PORT_ASSIGNMENT.md, docs/evidence/gx10_runtime_baseline.json, ontology/configuration_management.ttl.

Required first step: Refresh runtime evidence with python3 scripts/capture_gx10_runtime_baseline.py, then run ./scripts/gx10_config_guard.py [--live] before any change. If it fails, fix the reported issues first.

@louspringer
louspringer / SMOKE_TEST_120B_2026-03-13.md
Created March 17, 2026 21:25
Smoke test: 120B endpoint on gx10-83fb (Qwen 122B)

Smoke test: 120B endpoint on gx10-83fb

Date: 2026-03-13
Endpoint: http://gx10-83fb.tail3dac72.ts.net:8002
Service: Qwen 122B A10B (llama.cpp) via systemd user unit qwen-122b


Deployment summary

@louspringer
louspringer / BENCHMARK_120B.md
Created March 17, 2026 21:18
120B benchmark: tokens/s, interpretation, Prometheus correlation, and charts (gx10 Qwen 122B)