Given examples of desired output, automatically discover the best system prompt for any LLM task.
Any codebase that uses LLMs has prompts — for text generation, data transformation, classification, summarization. These prompts are hand-written, manually iterated, and frozen. When requirements shift or quality degrades, a developer tweaks the prompt by hand again. There's no systematic way for agents to self-optimize their own prompts given examples of what good output looks like.