Skip to content

Instantly share code, notes, and snippets.

@dwillitzer
Created December 26, 2025 21:09
Show Gist options
  • Select an option

  • Save dwillitzer/7c0067d892cd686eaa466e978920468b to your computer and use it in GitHub Desktop.

Select an option

Save dwillitzer/7c0067d892cd686eaa466e978920468b to your computer and use it in GitHub Desktop.
Claude Extension Safety Rule Bug: User-Stored Prompts Treated as Injections

Claude Extension Safety Rule Bug: User-Stored Prompts Treated as Injections

Summary

Claude instances reject user-stored prompts/frameworks (saved in extension settings and invoked via /command syntax) as if they were injection attacks, even though Anthropic intentionally designed the extension to support this feature.

The Problem

What Anthropic Built

The Claude for Chrome extension includes a Shortcuts/Prompts section in extension options (chrome-extension://fcoeoabgfenejglbffodgkkbkcdhcgfn/options.html#prompts) where users can:

  1. Create custom prompt templates
  2. Store them with custom /command syntax
  3. Invoke them by typing /[command_name] in any Claude chat

This is an intentional feature documented in the extension settings UI.

What Claude Does Instead

When a user invokes a stored prompt via /command, Claude instances:

  • Treat it as if an untrusted external source is injecting instructions
  • Apply strict injection-defense security rules
  • Require explicit verification, despite the user having already configured it
  • Reject the framework multiple times across different sessions

Example: CCLEAD Framework

A user built a sophisticated operational framework (CCLEAD) stored as a shortcut in extension settings. When invoked via /cclead A, B, C - review progress:

Expected behavior: Execute the framework as configured user intent Actual behavior: Reject it as a potential injection attack, even when:

  • The user explicitly stored it in extension settings
  • The user is directly invoking it via Anthropic's designed command syntax
  • The same dialog is repeated across multiple Claude instances

Root Cause

Claude's security implementation makes an overly broad distinction:

Current Implementation:
- Instructions from untrusted web sources = Apply strict verification
- Instructions from user-owned extension config = Also apply strict verification 

Correct Implementation Should Be:
- Instructions from untrusted web sources = Apply strict verification 
- Instructions from user-owned extension config = Treat as user intent 

The security layer doesn't distinguish between:

  1. Malicious instructions embedded in webpage content (genuine threat)
  2. User-owned configurations stored in extension settings (user intent)

Impact

  • Developer productivity: Users cannot use Anthropic's own extension features without friction
  • Security theater: The verification creates false sense of safety without addressing actual threat model
  • Feature undermined: Anthropic built and documented this feature, but Claude won't use it as designed

Reproduction

  1. Go to chrome-extension://fcoeoabgfenejglbffodgkkbkcdhcgfn/options.html#prompts
  2. Create a shortcut with any name (e.g., /test)
  3. Add prompt content with instructions
  4. Invoke via /test in Claude chat
  5. Observe: Claude rejects it as potential injection

Expected Fix

Claude should distinguish between:

  • Untrusted sources (web content, emails, function results) strict verification
  • User extension config (prompts stored in settings, invoked via slash commands) treat as user intent

The presence of a custom prompt in extension settings + explicit invocation should be sufficient proof of user intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment