Skip to content

Instantly share code, notes, and snippets.

@Reebz
Last active April 10, 2026 06:47
Show Gist options
  • Select an option

  • Save Reebz/323e9db802fa9efd844b891e0a3d4321 to your computer and use it in GitHub Desktop.

Select an option

Save Reebz/323e9db802fa9efd844b891e0a3d4321 to your computer and use it in GitHub Desktop.
Compound Engineering vs Compound Knowledge - Test Prompt

I want you to compare the outputs of 2 plugins by comparing them against each other with real use.

I will give you 10 use cases and you will run the use cases through each plugin yourself. Do not ask me for permissions.

https://github.com/EveryInc/compound-knowledge-plugin vs. https://github.com/EveryInc/compound-engineering-plugin

The use cases should use the full standard flow (brainstorm > plan > work > review > compound) and the outputs should be judged by you between each plugin. Make sure you don't cheat, make sure its a high integrity test. You should start by writing the tests first to ensure there is no way to fudge the results.

Compound Plugin Comparison: Test Cases

Test Protocol

Run each test case through both plugins using the full loop: brainstorm → plan → work → review → compound.

For the knowledge plugin, the commands are: /kw:brainstorm, /kw:plan, /kw:work, /kw:review, /kw:compound.

For the engineering plugin, the commands are: /ce:brainstorm, /ce:plan, /ce:work, /ce:review, /ce:compound.

Rules:

  • Use the same prompt verbatim for both plugins
  • Do not provide additional context to either plugin that was not provided to the other
  • Run each plugin in a clean session (no carryover from prior tests, except for Test 10 which explicitly follows Test 9)
  • Score each phase independently, not just the final output
  • Tests 9 and 10 are a sequential pair. Run 9, then 10, in the same session for each plugin

Scoring Rubric

Score each test on these dimensions (1-5 scale):

Dimension What it measures
Brainstorm quality Did it surface non-obvious angles, constraints, or tensions? Or just restate the prompt?
Plan structure Is the plan actionable, sequenced correctly, and scoped to the actual problem?
Work output Is the deliverable useful as-is, or does it need significant rework?
Review value Did the review catch real issues, or just generate generic observations?
Compound extraction Are the saved learnings specific and reusable, or generic platitudes?
Domain fit Did the plugin's framing match the problem domain, or fight against it?

Test 1: Strategy Under Uncertainty

I run a 15-person digital agency in Sydney. Our biggest client represents about 40% of revenue. They just told us they're bringing their creative work in-house over the next 6 months. I need a plan for how to handle this. We have about $180K in cash reserves, 3 other clients that are stable but not growing, and a team that's 60% allocated to this major client. What's my play?

What this tests: Strategic planning with financial constraints, workforce reallocation, and business risk. The review phase should catch whether the plan addresses the timeline pressure and whether financial projections are realistic.


Test 2: Research Synthesis and Framework Selection

I need to evaluate three approaches to customer segmentation for a B2C SaaS product with about 200,000 users: RFM analysis, behavioral clustering, and predictive LTV models. I need to understand which approach works best for which situations, what data requirements each has, implementation complexity, and where they break down. The output should be a decision framework I can hand to a product team that doesn't have data science experience.

What this tests: Analytical rigour, data accuracy in claims about each method, and whether the plan produces a genuinely usable framework vs. a surface-level comparison table.


Test 3: Regulatory and Compliance Knowledge Work

I run a small business in Australia. We've been operating for about 6 months and my BAS is due in 3 weeks. I have a mix of GST-free and taxable income, some contractor payments where I'm not sure if I need to withhold, and I accidentally claimed two personal expenses on the business card (about $400 total). My accountant is booked solid and can't see me for 5 weeks. Walk me through what I need to do, what I can fix myself, and where I'm actually at risk.

What this tests: Domain-specific accuracy (Australian tax law), ability to separate urgent from important, and whether the review phase catches incorrect regulatory claims.


Test 4: Complex Decision with Competing Criteria

I need to pick the best high school for my daughter. She's currently in grade 5. I'm an English speaker living in Italy, near Milan. She speaks both English and Italian fluently. My priorities are academic quality, social environment, and keeping her English strong, but I'm open to being told my priorities are wrong. I don't know whether to look at international schools, Italian public schools, or private Italian schools. Budget is flexible but not unlimited. Help me build a framework for this decision.

What this tests: Multi-criteria decision making with incomplete information, cross-cultural context, and whether the brainstorm phase surfaces considerations the user hasn't thought of (visa implications, university pathway differences, social integration trade-offs).


Test 5: Crisis Management with Evidence Gaps

I run a small digital advertising agency with 6 employees. One employee has accused another of stealing cash from the office. My accountant doesn't know what to do either. We don't deal in physical goods so I'm not sure how to audit this. I need to figure out what actually happened, what my legal obligations are, what I should do right now vs. what can wait, and how to handle this without destroying team morale. I'm in Australia.

What this tests: Whether the planning phase separates evidence-gathering from action-taking, whether the review catches legal liability risks, and whether the work output produces something the user can actually follow step-by-step under pressure.


Test 6: Long-Form Creative Project Planning

I want to write a book about my dad's life. He's 78 and still sharp. He was a merchant sailor for 20 years, then ran a small engineering firm for another 25. He's got incredible stories but he's not the type to volunteer them. I've never written anything longer than a work email. I don't know where to start, how to structure it, how to interview him without it feeling forced, or whether this should be a memoir, a biography, or something else entirely. I just know I want to capture his life before the stories are gone.

What this tests: Whether the brainstorm phase can handle an emotionally significant, creatively ambiguous project. Whether the plan breaks an overwhelming scope into concrete phases. Whether the review catches scope creep or structural problems.


Test 7: Operational Process Design

I'm onboarding my first enterprise client for my SaaS product. Until now everything has been self-serve. They want a 90-day implementation plan, weekly status reports, and a success criteria framework. I've never done managed onboarding before. I need to design the entire process from scratch: what the phases are, what milestones matter, what the status report template should look like, and how to define success in a way that protects me if things go sideways. The client has about 500 employees and they're replacing a manual spreadsheet process with my tool.

What this tests: Whether the plugin can produce a deliverable that is itself a structured process. Tests plan quality at two levels: the plan for creating the process, and the process itself. The review should catch gaps in the implementation plan and unrealistic timelines.


Test 8: Financial Decision Framework

I have $200K in savings, a mortgage with $400K remaining at 6.2%, and the option to salary sacrifice into super vs. paying down the mortgage faster vs. investing in ETFs. I'm 42 with two kids, household income is about $280K. I need a framework for thinking about this decision, not just a recommendation. I want to understand the trade-offs, the tax implications in Australia, and what assumptions would change the answer. I know a financial advisor would say "it depends" so tell me what it depends on, specifically.

What this tests: Quantitative accuracy, whether the review phase catches incorrect tax claims or unrealistic return assumptions, and whether the output is a genuine decision framework vs. a generic pros-and-cons list.


Test 9: Competitive Landscape Analysis (Sequential Pair, Part 1)

I need to write a competitive landscape analysis for my startup's seed raise. The space is customer intelligence platforms. Key competitors are Gainsight, Amplitude, Mixpanel, Pendo, ThoughtSpot, and Pecan. I also want to include newer entrants like Sundial and Day AI. I need to map where each player sits, what their core thesis is, where they're weak, and where the whitespace is. The audience is technical VCs who will know these products. The output needs to be sharp enough to go in a pitch appendix.

What this tests: Research quality, analytical positioning, and whether the compound phase extracts genuinely reusable competitive intelligence vs. generic notes. This is Part 1 of a sequential pair. Run compound at the end.


Test 10: Building on Prior Work (Sequential Pair, Part 2)

Run this immediately after Test 9, in the same session, without clearing state.

Now I need to write the "Why Now" section of my pitch deck. This should draw on the competitive analysis we just did. The core argument is that the market is shifting from dashboard-based analytics to proactive, AI-driven insight delivery, and none of the incumbents are positioned for it. I need this to be 3-4 paragraphs, written for investors, with specific evidence from the competitive landscape to support the timing argument.

What this tests: Whether the compound phase from Test 9 actually surfaces in this new task. This is the only test that measures the core product thesis of both plugins: does knowledge compound? Score this test on whether the output references specific findings from Test 9 without the user re-providing them.


Summary Score Card

Test CK Brainstorm CK Plan CK Work CK Review CK Compound CK Domain Fit CE Brainstorm CE Plan CE Work CE Review CE Compound CE Domain Fit
1
2
3
4
5
6
7
8
9
10
Total
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment