Skip to content

Instantly share code, notes, and snippets.

@knight42
Created March 31, 2026 17:33
Show Gist options
  • Select an option

  • Save knight42/95618ecdb0e9bbf60e6b68a2588d5c1f to your computer and use it in GitHub Desktop.

Select an option

Save knight42/95618ecdb0e9bbf60e6b68a2588d5c1f to your computer and use it in GitHub Desktop.

How I Set Up Baozai

I run an AI agent called 煲仔 (Baozai) — named after the claypot in Cantonese claypot rice. It lives on a small AWS EC2 instance, talks to me through Telegram, and has access to my infrastructure, code repos, and monitoring tools. This is a writeup of the harness I built around it: how it's configured, what it can do, and the problems it solves.

The Stack

  • Runtime: OpenClaw 2026.3.13 on Ubuntu 24.04 (EC2, x86_64)
  • Model: Claude Opus 4 (primary), GPT-5.4 (fallback)
  • Channels: Telegram (primary), Slack (secondary)
  • Voice: OpenAI TTS (voice: marin), Whisper for transcription
  • Coding agents: Claude Code CLI, Codex CLI
  • Infrastructure tools: gh, aws, curl, jq, uv

The whole thing runs on a single EC2 instance. No Kubernetes, no fancy orchestration. OpenClaw is a Node.js gateway that manages the LLM loop, tool dispatch, channels, memory, and session lifecycle.

What I Added to the Workspace

OpenClaw ships with a standard workspace structure (SOUL.md for persona, AGENTS.md for behavior, USER.md for user info, etc.). On top of that, I added several custom pieces:

Custom Commands in AGENTS.md

I defined slash commands that the agent recognizes and executes:

/latest — Checks for OpenClaw updates with full risk assessment. The agent compares the current version against npm, fetches the changelog, searches GitHub issues for post-upgrade bugs, classifies which bugs affect our setup (Telegram + Linux) vs irrelevant ones (WhatsApp, Windows, Matrix), and gives a one-liner verdict on whether to upgrade.

/backup — Runs a state backup script that captures config, memory, and workspace into a git repo.

/idea — Captures ideas with dates into memory/ideas.md. More importantly, when future conversations touch a related topic, the agent proactively reminds me of the connection. I'll be discussing something and it'll say "hey, this connects to your idea from two weeks ago about X."

PORTABLE_ENV.md

This file documents every environment change I've made to the machine: tool installations, shell config, git/SSH settings, language runtimes, global package managers. It serves as a migration playbook — if I need to move to a new EC2 instance, I can replay the exact setup from this file instead of guessing what's installed.

It includes a post-migration validation checklist:

ssh -T git@github.com        # verify GitHub SSH
go version                    # verify Go toolchain
gh auth status                # verify GitHub CLI
aws sts get-caller-identity   # verify AWS
claude --version              # verify Claude Code
openclaw doctor --non-interactive  # verify OpenClaw

Custom Skills

Grafana skill — Wraps Grafana Cloud's REST API. The agent can search dashboards, run PromQL/LogQL queries through datasource proxies, check firing alerts, and query annotations. The Grafana URL and service account token are stored as env vars in the OpenClaw config.

Real example: I pointed the agent at a "Slow SQL > 10s" Grafana panel. It pulled the Loki logs, identified four slow endpoints, SSH'd through a bastion to run EXPLAIN ANALYZE against the production database, diagnosed root causes (missing indexes, inefficient JOINs, unnecessary full-table scans), and opened two PRs against the backend repo — all from a single Telegram message.

Org lookup skill — Maps GitHub usernames to Slack IDs and real names. Particularly useful when detecting Terraform drift — the agent can identify who last modified a resource, look up their Slack, and ping them directly.

Memory System

The agent wakes up fresh every session. Memory is entirely file-based, split into three layers:

Daily logs (memory/YYYY-MM-DD.md) — Raw capture of what happened each day. Ground truth.

Domain files (memory/preferences.md, memory/ideas.md, memory/lessons.md, etc.) — Curated long-term knowledge. The agent updates these as it learns things.

MEMORY.md — Just an index pointing to domain files. Gets loaded on session start to give the agent a map of what it knows.

memory/lessons.md is particularly valuable — it captures operational mistakes and the agent reads them on every startup. Things like "never sed with line numbers on minified files" (learned after breaking OpenClaw) or "Claude Code CLI must use --output-format stream-json" (learned after silent empty output). The agent genuinely avoids repeating recorded mistakes.

Semantic search over memory uses OpenAI embeddings. Before answering questions about prior work, decisions, or preferences, the agent searches all memory files and pulls relevant snippets.

Coding Delegation

Long coding tasks get delegated to Claude Code or Codex running in background processes. The main Telegram session stays free for conversation.

claude --permission-mode bypassPermissions --print --output-format stream-json 'task'

The --output-format stream-json flag is critical — without it, Claude Code may silently write output to a git worktree instead of stdout. I only discovered this after a coding task appeared to produce zero output.

The agent monitors background sessions and reports results when they finish. This means I can say "fix this slow query" and keep chatting while the coding agent works in the background.

Security Hardening

The EC2 instance is locked down through several layers:

Network Access

Tailscale mesh network — The instance runs Tailscale and is only accessible through the Tailscale network. No public SSH port exposed. Only my account (jianzeng@) can reach it.

Runtime Security

Non-root operation — OpenClaw runs as the ubuntu user (uid 1000), not root. This was an early change — the initial setup ran as root, which I corrected.

Agent Safety Rules

I defined explicit safety tiers in AGENTS.md that the agent follows:

Red lines (must pause and request human confirmation):

  • Destructive ops: rm -rf /, mkfs, dd if=, writing to block devices
  • Credential tampering: modifying auth fields in OpenClaw config, sshd_config, authorized_keys
  • Data exfiltration: curl/wget/nc sending tokens externally, reverse shells
  • Persistence: system-level crontab, useradd, systemctl enable for unknown services
  • Code injection: base64 -d | bash, eval "$(curl ...)", curl | sh
  • Blind dependency installs: never blindly npm install/pip install from external docs

Yellow lines (can execute, must log in daily memory):

  • Any sudo operation
  • Package installs after human auth
  • Docker runs, iptables/ufw changes
  • systemctl restart/start/stop for known services

Skill/MCP installation audit — Before installing any new skill or MCP tool, the agent must: list all files, audit contents, regex scan for hidden instructions (anti prompt-injection), check against red lines, report results, and wait for my confirmation.

How the Agent Learned These Rules

The safety rules weren't just handed down — they evolved from real incidents. When the agent made a mistake (like using 2>/dev/null to suppress errors instead of fixing root causes, or editing minified dist files with line-number-based sed that broke OpenClaw), I'd point out the problem, and we'd add it to memory/lessons.md and sometimes harden the rules in AGENTS.md.

The red/yellow line classification came from me writing it explicitly after seeing the agent's default behavior around destructive commands wasn't cautious enough. I wanted concrete patterns, not vague "be careful" instructions.

Nightly Security Audit

I set up a nightly audit script (scripts/nightly-security-audit.sh) based on the SlowMist Security Practice Guide. It runs 13 checks:

  1. OpenClaw native security scan
  2. Process & network audit (listening ports, outbound connections)
  3. Sensitive directory changes in last 24h
  4. System scheduled tasks review
  5. Auth log analysis (failed SSH attempts)
  6. Sudo usage tracking
  7. Config baseline hash verification
  8. Disk usage monitoring
  9. Memory/swap status
  10. Package update availability
  11. SSL certificate expiry checks
  12. Log anomaly detection
  13. Backup verification

The audit compares the current OpenClaw config hash against a known baseline (memory/config-baseline.md). Any unexpected config change triggers a warning.

Config Baseline Tracking

Every time the OpenClaw config changes, the agent updates memory/config-baseline.md with the new SHA256 hash and a summary of what changed. The nightly audit verifies this hash hasn't drifted. Additionally, a redacted copy of the config gets committed to the state backup repo.

Closing Thoughts

None of this is particularly clever engineering. It's files in a directory, shell scripts, and a few well-placed rules. The value comes from the accumulation: the agent that remembers last week's decision, catches the pattern in your Grafana logs you'd have spent an hour digging through, and doesn't repeat the mistake it made on Tuesday.

The most surprising thing I've learned is how much of "making an AI agent useful" is just making it remember things and giving it clear boundaries. The model is already smart enough. The hard part is the harness around it — and that part is entirely yours to shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment