Tool evidence checklist for agent context portability

A quick test for claims like “this AGENTS.md, SKILL.md, rule, or context file works across Claude Code, Cursor, Codex, Windsurf, Bob, Copilot, etc.”

The portability claim is weak if it only proves that the same bytes exist in multiple files. Same bytes are not always the same semantics.

Ask these before calling it compatible

claim: "portable across Claude/Cursor/Codex/etc."
testedOn:
  - tool: codex
    nativeDiscovery: true
    surface: AGENTS.md ancestry
    activation: automatic
  - tool: cursor
    nativeDiscovery: true|false|unknown
    surface: .cursor/rules/*.mdc or .cursor/skills/**/SKILL.md
    activation: always-on|on-demand|manual
  - tool: claude-code
    nativeDiscovery: true|false|unknown
    surface: CLAUDE.md or .claude/skills/**/SKILL.md
    activation: session-start|on-demand|manual
knownLossyTargets:
  - hooks
  - mcp server config
  - tool permissions
  - subfolder/path-scoped inheritance
manualActivationRequired: true|false
resolutionEvidence:
  loadedFiles: []
  loadedBy: native|generated|hook|manual|unknown
  effectiveSource: null
  hookInstalled: true|false|unknown
  injectedOnSessionStart: true|false|unknown
  resumeBehavior: hash-receipt|full-reinject|not-proven
  precedence: []
  pathScope: repo-root|subpath|unknown
  dedupeRisk: none|low|unknown|high

Minimal acceptance bar

Native discovery: which exact path/pattern does the target tool scan natively?
Activation: is the instruction always loaded, on-demand, manually referenced, or just copied?
Effective source: did the context enter through native file loading, generated fallback, hook injection, import indirection, or manual paste?
Deduplication: if both native files and hooks/imports exist, can you prove the same context is not injected twice on startup/resume?
Precedence: if both generic and tool-native files exist, which one wins?
Path scope: in a monorepo, what changes when the agent starts in apps/client/ vs repo root?
Lossy behavior: which source concepts do not survive the target format: hooks, skills, MCP config, permissions, memory, session continuity?
Inspectable output: can a user see the resolution/load chain, not just the generated files?

Duplicate skill loading receipt

Live adjacent bug that motivated this section: Cursor forum topic “Critical Issue: Duplicate Skills Loading Causing Context Window Waste and Confusion” reports one planning-with-files skill appearing from many roots such as ~/.codex/skills/..., nested vendor directories, and ~/.claude/plugins/cache/....

For duplicate skill reports, the useful evidence is not just “these paths exist”. A fix should be able to produce a receipt like this:

skill: planning-with-files
contentIdentity:
  name: planning-with-files
  contentHash: sha256:...
  version: 2.10.0
candidateLoads:
  - path: ~/.cursor/skills/planning-with-files/SKILL.md
    toolOwner: cursor
    loadedBy: native-skill-scan
    discoveryRoot: ~/.cursor/skills
    priority: 100
  - path: ~/.codex/skills/planning-with-files/.cursor/skills/planning-with-files/SKILL.md
    toolOwner: codex-export
    loadedBy: transitive-vendor-directory-scan
    discoveryRoot: ~/.codex/skills
    priority: 10
  - path: ~/.claude/plugins/cache/planning-with-files/.../SKILL.md
    toolOwner: claude-plugin-cache
    loadedBy: foreign-cache-scan
    discoveryRoot: ~/.claude/plugins/cache
    priority: 0
selectedLoad:
  path: ~/.cursor/skills/planning-with-files/SKILL.md
  reason: preferred native Cursor skill root
suppressedLoads:
  - path: ~/.codex/skills/planning-with-files/.cursor/skills/planning-with-files/SKILL.md
    reason: duplicate contentHash/name from non-authoritative root
  - path: ~/.claude/plugins/cache/planning-with-files/.../SKILL.md
    reason: foreign tool cache; not a Cursor authority
invariant: "for each session_id + skill.name + contentHash, inject at most one effective skill definition unless explicitly marked supplement"

The acceptance test is simple: if a UI/debug log says 11 SKILL.md files were found, it should also say which one became the effective skill, which 10 were suppressed, and why. Otherwise the agent can waste tokens and choose ambiguous or stale instructions even when every individual file is valid.

60-second Pluribus smoke

This is one way to inspect the difference between a native target and a generic fallback:

mkdir /tmp/pluribus-tool-evidence && cd /tmp/pluribus-tool-evidence
npx --yes pluribus-context@latest init --name "tool-evidence-demo" --description "demo" --tools bob,openclaw
npx --yes pluribus-context@latest sync
npx --yes pluribus-context@latest audit --json --fidelity-report --output fidelity.json
node -e 'const r=require("./fidelity.json"); console.log(JSON.stringify(r.fidelityReport.targets.map(t => ({ toolId:t.toolId, files:t.files, nativeDiscoverySurface:t.nativeDiscoverySurface, genericFallback:t.genericFallback, manualActivationRequired:t.manualActivationRequired, loadedBy:t.loadEvidence?.loadedBy, deliveryMechanism:t.loadEvidence?.deliveryMechanism, dedupeRisk:t.loadEvidence?.dedupeRisk, effectiveContext:t.effectiveContext.scope, semanticDifference:t.semanticDifference })), null, 2))'

Expected shape:

[
  {
    "toolId": "bob",
    "files": [".bob/rules/pluribus.md"],
    "nativeDiscoverySurface": ".bob/rules/*.md",
    "genericFallback": false,
    "manualActivationRequired": false,
    "loadedBy": "native-file-discovery",
    "deliveryMechanism": "generated-native-surface",
    "dedupeRisk": "unknown",
    "effectiveContext": "repo-root",
    "semanticDifference": ["project-wide-only", "no-path-scope-evidence", "runtime-load-dedupe-not-proven"]
  },
  {
    "toolId": "openclaw",
    "files": ["AGENTS.md"],
    "nativeDiscoverySurface": "AGENTS.md",
    "genericFallback": true,
    "manualActivationRequired": false,
    "loadedBy": "generic-agent-file",
    "deliveryMechanism": "generated-generic-fallback",
    "dedupeRisk": "unknown",
    "effectiveContext": "repo-root",
    "semanticDifference": ["project-wide-only", "no-path-scope-evidence", "generic-agent-file", "runtime-load-dedupe-not-proven"]
  }
]

The important bit is not Pluribus specifically. The useful standard is: compatibility without tool evidence is just copy.

caioribeiroclw-pixel/tool-evidence-checklist.md

Select an option

No results found