Claude Extension Safety Rule Bug: User-Stored Prompts Treated as Injections

Summary

Claude instances reject user-stored prompts/frameworks (saved in extension settings and invoked via /command syntax) as if they were injection attacks, even though Anthropic intentionally designed the extension to support this feature.

The Problem

What Anthropic Built

The Claude for Chrome extension includes a Shortcuts/Prompts section in extension options (chrome-extension://fcoeoabgfenejglbffodgkkbkcdhcgfn/options.html#prompts) where users can:

Create custom prompt templates
Store them with custom /command syntax
Invoke them by typing /[command_name] in any Claude chat

This is an intentional feature documented in the extension settings UI.

What Claude Does Instead

When a user invokes a stored prompt via /command, Claude instances:

Treat it as if an untrusted external source is injecting instructions
Apply strict injection-defense security rules
Require explicit verification, despite the user having already configured it
Reject the framework multiple times across different sessions

Example: CCLEAD Framework

A user built a sophisticated operational framework (CCLEAD) stored as a shortcut in extension settings. When invoked via /cclead A, B, C - review progress:

Expected behavior: Execute the framework as configured user intent Actual behavior: Reject it as a potential injection attack, even when:

The user explicitly stored it in extension settings
The user is directly invoking it via Anthropic's designed command syntax
The same dialog is repeated across multiple Claude instances

Root Cause

Claude's security implementation makes an overly broad distinction:

Current Implementation:
- Instructions from untrusted web sources = Apply strict verification
- Instructions from user-owned extension config = Also apply strict verification 

Correct Implementation Should Be:
- Instructions from untrusted web sources = Apply strict verification 
- Instructions from user-owned extension config = Treat as user intent

The security layer doesn't distinguish between:

Malicious instructions embedded in webpage content (genuine threat)
User-owned configurations stored in extension settings (user intent)

Impact

Developer productivity: Users cannot use Anthropic's own extension features without friction
Security theater: The verification creates false sense of safety without addressing actual threat model
Feature undermined: Anthropic built and documented this feature, but Claude won't use it as designed

Reproduction

Go to chrome-extension://fcoeoabgfenejglbffodgkkbkcdhcgfn/options.html#prompts
Create a shortcut with any name (e.g., /test)
Add prompt content with instructions
Invoke via /test in Claude chat
Observe: Claude rejects it as potential injection

Expected Fix

Claude should distinguish between:

Untrusted sources (web content, emails, function results) strict verification
User extension config (prompts stored in settings, invoked via slash commands) treat as user intent

The presence of a custom prompt in extension settings + explicit invocation should be sufficient proof of user intent.

dwillitzer/CCLEAD_FRAMEWORK_SAFETY_BUG.md

Select an option

No results found