Skip to content

Instantly share code, notes, and snippets.

View AnthonyAlcaraz's full-sized avatar

Anthony ALCARAZ AnthonyAlcaraz

View GitHub Profile
@AnthonyAlcaraz
AnthonyAlcaraz / linkedin-collective-harness-2026-04-12.md
Created April 12, 2026 10:14
The Collective Harness: Team OS, SkillClaw, and Why Agent Knowledge Must Compound Across People

title: "The Collective Harness: Team OS, SkillClaw, and Why Agent Knowledge Must Compound Across People" date: 2026-04-12 type: linkedin status: draft featured-image: https://i.imgur.com/d9jZrc0.png gist-url: linkedin-publish-url: character-count: 2842 hook-type: concrete-scene

@AnthonyAlcaraz
AnthonyAlcaraz / linkedin-post-clean.txt
Last active April 11, 2026 09:40
Enterprise Agent Harness: Connectivity Between Planning and Self-Improvement - LinkedIn Post
The compounding gap in enterprise AI agents is not model intelligence. It is not tool access. It is the missing wire between planning and self-improvement.
79% of organizations have adopted AI agents. Only 23% are actively scaling them. That gap traces to a connectivity failure inside the harness itself.
@Harrison Chase laid out the taxonomy. Agents learn at three layers: model weights, harness code, and runtime context. Model-layer learning risks catastrophic forgetting. Harness-layer learning is version-controlled and cannot forget. Context-layer learning is append-only. Most deployments pour resources into model selection while ignoring the two layers where practical improvement lives.
The consequence: an agent that plans, executes, and forgets what worked. Next invocation, same mistakes. Planning and improvement run in parallel, never touching.
Stanford's Meta-Harness, led by Chelsea Finn and @Deedy Das, proved what this costs. A system reading execution traces and proposing harness improvements outpe
@AnthonyAlcaraz
AnthonyAlcaraz / linkedin-memory-gist.md
Created April 10, 2026 07:23
Agent Memory: Five Teams Converge on Forgetting - LinkedIn Post

Everyone is building agent memory that remembers more. The five teams shipping production agents are building memory that forgets.

@Andre Lindenberg built MemoryBear with a pipeline modeled on human sleep consolidation: mark episodes for relevance, merge related ones into semantic knowledge, decay the rest. Additive-only memory degrades agent performance because stale memories poison the context window. Every append-only store eventually drowns the agent in outdated signal.

AnimaWorks implements this with ChromaDB for episodes and NetworkX graphs for spreading activation retrieval. The graph enables principled decay. When a memory node loses all high-weight connections, the system knows it can be forgotten. A vector in a flat store has no structural signal for obsolescence.

Five teams converge on what @Harrison Chase calls the "dreaming" pattern: batch-process execution traces during idle periods, extract patterns, synthesize into persistent knowledge. SimpleMem does it with a three-stage pipeline. OpenCla

@AnthonyAlcaraz
AnthonyAlcaraz / linkedin-gist.md
Created April 10, 2026 07:13
Enterprise Agentic Harness: Biggest Modernization Challenge Since the Internet - LinkedIn Post

When enterprises deploy agents without the right infrastructure, governance collapses at five agents, 22% see unintended behaviors, and 18% produce decisions nobody can audit. @Phil Fersht surveyed 200+ Global 2000 leaders and captured it: "We treated agents like software. They behave like employees."

That infrastructure gap is the biggest modernization challenge since the internet. Five capabilities must converge simultaneously. No existing vendor stack provides them together.

Access and authority. @Karl McGuinness identified what enterprise IAM is missing: a formal Authority layer. Identity establishes who. Access determines whether. But Authority, the purpose-scoped right governing whether execution should still be running, has no enterprise equivalent. A CFO's agent pulling pre-IPO financials on an expired mandate passes every access check.

Governance. @Wayne Butterfield's critique resonated with 398 people because it named what everyone feels: the market lumps builders, orchestrators, and governed wor

@AnthonyAlcaraz
AnthonyAlcaraz / linkedin-post.md
Created April 9, 2026 07:07
Team OS Needs Graph Ontology - LinkedIn Post

@Hannah Stulberg built a Claude Code repo at DoorDash that 20 people use daily. Root CLAUDE.md under 500 tokens. Nested indexes in every folder. Each function owns its context. Data scientists pull metrics without asking. Strategy leads grab call summaries. Engineers check analytics without filing tickets.

She calls it a Team OS. It is the most production-ready multi-user AI knowledge architecture I have seen documented publicly.

Here is the problem she will face within six months.

@Andrej Karpathy's LLM Wiki works beautifully for one person. You dump messy notes, the LLM compiles them into a wiki, every query makes the wiki better. The loop compounds because one mind, one vocabulary, one namespace. No ambiguity about what "revenue" means when you are the only person defining it.

@Siddharth Shah studied what happens when you scale this to teams. Four failure modes appeared in every enterprise knowledge graph project since the Semantic Web era. Ungoverned documents that make the graph a liability. Nobody o

@AnthonyAlcaraz
AnthonyAlcaraz / linkedin-post-clean.txt
Created April 8, 2026 18:33
The Negation Test: Why Enterprise Agents Fail - LinkedIn Post
Here is how an enterprise agent handles "which customers have NOT renewed": it searches for renewal records, finds none, and confidently reports zero. It cannot distinguish between "no renewal exists" and "the data source does not track renewals." The absence of evidence becomes evidence of absence.
This single failure mode reveals the actual bottleneck in enterprise AI. @Juan Sequeda calls it the "iron thread" problem: organizations built data catalogs that describe WHAT exists and WHERE it lives, but never encoded WHAT IT MEANS. A metadata catalog says "renewals table, 4.2M rows, updated daily." An ontology says "a renewal is a contractual event that extends an existing agreement, requires an active subscription as precondition, and its absence after contract_end_date constitutes churn."
The difference is not academic. It is the difference between an agent that retrieves data and one that reasons about it.
@Lulit Tesfaye at Enterprise Knowledge draws the line precisely: a semantic layer defines stable me
@AnthonyAlcaraz
AnthonyAlcaraz / linkedin-gist.md
Created April 8, 2026 06:37
Enterprise Agents Need Semantic Infrastructure - LinkedIn Post

When an enterprise agent cannot answer "which customers have NOT renewed," nobody blames the semantic layer. They blame the model. They upgrade. The larger model still fails. Because vector similarity has no concept of negation.

This is the scalability constraint most AI strategies miss. Organizations pour resources into model capability while the real bottleneck sits one layer below: the semantic infrastructure that transforms raw data into machine-navigable meaning.

Three layers, each solving what the one below cannot.

Metadata catalogs describe what exists and where. But metadata tells an agent nothing about what data means. An agent with only a catalog knows there is a "customer" table and a "contract" table. It does not know contracts have renewal dates or that "active" means something different in EMEA.

Ontologies define meaning: concepts, relationships, constraints, valid operations. @Tony Seale identified the gap when Gartner declared semantic layers "non-negotiable" for AI. 44% of organizations c

@AnthonyAlcaraz
AnthonyAlcaraz / linkedin-post-clean.txt
Created April 7, 2026 11:51
Why Graphs Are Key for Agentic Harness - LinkedIn Post 2026-04-07
The harness thesis is settled. Same model, same weights, different harness: outside the top 30 to rank 5 on TerminalBench. @Akshay Pachaar documented 12 components that make the difference. @Raphael Mansuy measured a 6x gap from harness optimization alone. The question is no longer whether the harness matters. It is which harness architecture compounds.
The answer is the one backed by a graph.
@Andre Lindenberg mapped twelve Claude Code harness patterns to OS kernel primitives: page tables for tiered memory, garbage collection for dream consolidation, process isolation for subagents, ACLs for risk classification. Every pattern operates on a shared data structure. That structure is a graph.
Without one, harness components are blind to each other. Memory stores facts in flat files. Permissions live in JSON config. Workflow state sits in a process table. Each component works. None of them communicate. Your agent has memory that cannot inform its permissions and planning that cannot reference its evaluat
@AnthonyAlcaraz
AnthonyAlcaraz / ch2-revision-draft-2026-04-06.md
Last active April 6, 2026 19:07
O'Reilly Book Chapter Revisions - 2026-04-06 (Ch2 NEW + Ch8 Optimization)
title Chapter 2 — Agentic Graph Architecture Foundations: Revision Draft
date 2026-04-06
type book-revision-draft
status ready-for-google-doc
chapter 2
sources Lindenberg 12-pattern analysis of Claude Code open-source architecture
note Single consolidated addition. Placement references Google Doc outline sections.
@AnthonyAlcaraz
AnthonyAlcaraz / ch2-revision-draft-2026-04-06.md
Last active April 6, 2026 19:07
O'Reilly Book Chapter Revisions - 2026-04-06 (Ch3, Ch5, Ch6, Ch7)
title Chapter 2 — Agentic Graph Architecture Foundations: Revision Draft
date 2026-04-06
type book-revision-draft
status ready-for-google-doc
chapter 2
sources Lindenberg 12-pattern analysis of Claude Code open-source architecture
note Single consolidated addition. Placement references Google Doc outline sections.