Skip to content

Instantly share code, notes, and snippets.

@ianb
Last active July 20, 2025 00:19
Show Gist options
  • Save ianb/cf5a06b0932fb48d3fff8a5f76cd47ef to your computer and use it in GitHub Desktop.
Save ianb/cf5a06b0932fb48d3fff8a5f76cd47ef to your computer and use it in GitHub Desktop.
Draft spec of a document format for LLM prompts

CTX_MARK Specification (Draft)

A minimalist, structured language for LLM-readable documents, with support for:

  • Modular prompt design
  • Structured memory
  • Instructional scaffolding
  • Clear provenance and update semantics

📣 Prompt Introduction for the LLM

You are reading a structured document using CTX_MARK, a lightweight XML-inspired markup.

This document includes two sets of conventions:

🧾 Reading Instructions (for understanding and use)

  • Tags are written in ALL_CAPS.
  • IDs use the format «ID» (e.g., «NOTE-001»), and may appear in content or attributes.
  • All IDs are generated by the system. Do not invent new IDs unless explicitly instructed.
  • If you are asked to construct new content that references other new content, you may temporarily generate placeholder IDs like «NEW_ID_1», «NEW_ID_2». These will be rewritten by the system.
  • Attribute-specific instructions appear as instructions.ATTRIBUTE_NAME="...".
  • Content-specific instructions appear as <instructions>...</instructions> inside a tag.
  • Collapsible content may appear as COLLAPSED_WITH_CONTENT="..." or as a combination of <PREVIEW> and <FULL>.
  • SOURCE attributes reference the origin of content using structured IDs (e.g., «USER», «WEBPAGE-site-42»).
  • FORMAT attributes like FORMAT="markdown" or FORMAT="plaintext" indicate formatting expectations.

You are expected to interpret tags, structure, and source metadata. Do not output instruction elements or comments unless explicitly asked.

✍️ Writing Instructions (for when authoring CTX_MARK)

  • Use ID="«ID»" to uniquely mark elements for update or reference.
  • Use SOURCE="«ID»" to mark content origin.
  • Use <instructions>...</instructions> or instructions.ATTRIBUTE="..." to guide LLM output.
  • Use suffix conventions like *_DATE, *_TIMESTAMP, or CURRENCY_USD to indicate typed fields.
  • For collapsed sections, you may use either COLLAPSED_WITH_CONTENT or a <COLLAPSED> block with <PREVIEW> and <FULL> children.

IDs and instructions are integral to prompt interpretation and editing workflows.


🧱 Core Conventions

🔠 Tags

  • All tag names are in ALL_CAPS, e.g., <SUMMARY>, <NOTE>
  • Tags can include freeform text, nested children, or structured attributes

🆔 IDs and References

  • All referencable entities must have an ID attribute
  • IDs follow the format: «ID», e.g., «USER», «NOTE-003»
  • These IDs may appear in content, attributes, or provenance fields
  • Do not generate new IDs unless instructed; placeholder IDs should use the format «NEW_ID_1»

✍️ Instructions

Representing Changes

To indicate previous values when confirming or logging edits:

  • Use previous.ATTRIBUTE="..." alongside a new ATTRIBUTE to show the old value
  • Use <previous> tag as a sibling for nested content changes

Example:

<LINK
  HREF="https://newsite.com"
  previous.HREF="https://oldsite.com"/>
<NOTE ID="«NOTE-45»">
  <previous>This was the original note.</previous>
  This is the updated version.
</NOTE>

For content:

<TITLE>
  <instructions>Summarize in one sentence.</instructions>
  A short summary here
</TITLE>

For attributes:

<LINK 
  HREF="https://example.com" 
  instructions.HREF="Must be absolute URL"/>

📂 Format and Typing

  • Use FORMAT="markdown" or FORMAT="plaintext" to guide content formatting

  • Typed fields should use suffixes or explicit attributes:

    • *_TIMESTAMP, *_DATE, CURRENCY_USD, PERCENTAGE, etc.

🕵️ Provenance

  • Use SOURCE="«ID»" to mark origin of content

  • Sources can be:

    • «USER»
    • «LLM»
    • «WEBPAGE-nytimes.com-51»
  • Optional: declare full source metadata separately using <SOURCES>

<SOURCES>
  <SOURCE ID="«WEBPAGE-buzzfeed.com-41»" TYPE="external_web" URL="https://buzzfeed.com/article/41"/>
</SOURCES>

🧳 Collapsible Content

<COMPLETE_DESCRIPTION ID="«DESC-32»" COLLAPSED_WITH_CONTENT="30 words"/>

Or:

<COLLAPSED>
  <PREVIEW>A brief teaser...</PREVIEW>
  <FULL ID="«DESC-32»">The complete paragraph of content.</FULL>
</COLLAPSED>

📘 Examples and Testing

Use the lowercase <example> tag to provide demonstration prompts, partials, or tests.

  • Add purpose="..." to describe the intent of the example.
  • Use nested <input> and <output> blocks if needed.
<example purpose="Test summarization with user sentiment">
  <input>
    <NOTE ID="«NOTE-77»" SOURCE="«USER»">
      <instructions>Summarize sentiment neutrally</instructions>
      This thing is the worst.
    </NOTE>
  </input>
  <output>
    <SUMMARY ID="«SUMMARY-77»" SOURCE="«LLM»">
      The user was strongly dissatisfied.
    </SUMMARY>
  </output>
</example>

Lowercase tags are reserved for system-level metadata, samples, or scaffolding.


🔁 JSON Function Call Format for Tool Integration

While CTX_MARK uses XML-like markup, tools interacting with LLMs may use structured function calls in JSON. Each update action is a JSON object calling a named function. These updates apply to entire objects identified by ID. Paths and fine-grained selectors are not used.

General Form

update({
  "ID": "«SUMMARY-10»",
  "text": "Updated summary content"
})

Supported Functions

  • update({...}) — modifies fields of an existing object by ID
  • delete({ "ID": "..." }) — removes an object by ID
  • create({...}) — creates a new object with placeholder ID
  • append({ "ID": "«PARENT»", "child": { ... } }) — adds a child to an existing parent object
  • expand({ "ID": "«DESC-32»" }) — expands collapsed content

Examples

update({
  "ID": "«SUMMARY-10»",
  "text": "The user found it acceptable but unremarkable."
})
delete({ "ID": "«NOTE-22»" })
create({
  "type": "NOTE",
  "ID": "«NEW_ID_1»",
  "text": "Newly created content"
})
append({
  "ID": "«COMMENTS-4»",
  "child": {
    "type": "NOTE",
    "ID": "«NEW_ID_1»",
    "text": "Appended child"
  }
})
expand({
  "ID": "«DESC-32»"
})

These function patterns are intended to be LLM-friendly and avoid the ambiguity of JSON Patch syntax.


🧪 Functional Testing with CTX_MARK

To support functional testing of prompt behavior, CTX_MARK includes several attribute-level and tag-level conventions for test assertions:

Attribute-level Matching:

  • criteria.ATTR="..." – natural language description of the expectation
  • expect.ATTR="..." – exact value must match
  • pattern.ATTR="..." – a match pattern (e.g., regex or partial match language)
  • type.ATTR="..." – expected data type (e.g., DATE, NUMBER)
  • any.ATTR="1" – must exist and have a value
  • ignore.ATTR="1" – value is irrelevant or optional (or may not be present)

Child Element Matching:

Use lower-case tags like <criteria>, <expect>, <pattern>, <any>, <ignore> as needed to specify expectations for child content.

<NOTE ID="«NOTE-33»">
  <criteria>This should summarize the user’s attitude neutrally.</criteria>
</NOTE>
<LINK HREF="https://example.com" expect.HREF="https://example.com"/>

Tag-level Generalization:

  • <ignore.TAG /> – ignore all values within the named tag
  • <all.TAG ignore.TIMESTAMP="1" /> – ignore specific attributes in all matching tags

These conventions allow declarative test scaffolds to validate LLM output.

🧠 Design Goals

  • Visual clarity for both humans and LLMs
  • Explicit provenance and structure
  • Editable memory/reference objects
  • Zero-dependency XML superset—designed to be parsed loosely
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment