CTX_MARK Specification (Draft)

A minimalist, structured language for LLM-readable documents, with support for:

Modular prompt design
Structured memory
Instructional scaffolding
Clear provenance and update semantics

📣 Prompt Introduction for the LLM

You are reading a structured document using CTX_MARK, a lightweight XML-inspired markup.

This document includes two sets of conventions:

🧾 Reading Instructions (for understanding and use)

Tags are written in ALL_CAPS.
IDs use the format «ID» (e.g., «NOTE-001»), and may appear in content or attributes.
All IDs are generated by the system. Do not invent new IDs unless explicitly instructed.
If you are asked to construct new content that references other new content, you may temporarily generate placeholder IDs like «NEW_ID_1», «NEW_ID_2». These will be rewritten by the system.
Attribute-specific instructions appear as instructions.ATTRIBUTE_NAME="...".
Content-specific instructions appear as <instructions>...</instructions> inside a tag.
Collapsible content may appear as COLLAPSED_WITH_CONTENT="..." or as a combination of <PREVIEW> and <FULL>.
SOURCE attributes reference the origin of content using structured IDs (e.g., «USER», «WEBPAGE-site-42»).
FORMAT attributes like FORMAT="markdown" or FORMAT="plaintext" indicate formatting expectations.

You are expected to interpret tags, structure, and source metadata. Do not output instruction elements or comments unless explicitly asked.

✍️ Writing Instructions (for when authoring CTX_MARK)

Use ID="«ID»" to uniquely mark elements for update or reference.
Use SOURCE="«ID»" to mark content origin.
Use <instructions>...</instructions> or instructions.ATTRIBUTE="..." to guide LLM output.
Use suffix conventions like *_DATE, *_TIMESTAMP, or CURRENCY_USD to indicate typed fields.
For collapsed sections, you may use either COLLAPSED_WITH_CONTENT or a <COLLAPSED> block with <PREVIEW> and <FULL> children.

IDs and instructions are integral to prompt interpretation and editing workflows.

🧱 Core Conventions

🔠 Tags

All tag names are in ALL_CAPS, e.g., <SUMMARY>, <NOTE>
Tags can include freeform text, nested children, or structured attributes

🆔 IDs and References

All referencable entities must have an ID attribute
IDs follow the format: «ID», e.g., «USER», «NOTE-003»
These IDs may appear in content, attributes, or provenance fields
Do not generate new IDs unless instructed; placeholder IDs should use the format «NEW_ID_1»

✍️ Instructions

Representing Changes

To indicate previous values when confirming or logging edits:

Use previous.ATTRIBUTE="..." alongside a new ATTRIBUTE to show the old value
Use <previous> tag as a sibling for nested content changes

Example:

<LINK
  HREF="https://newsite.com"
  previous.HREF="https://oldsite.com"/>

<NOTE ID="«NOTE-45»">
  <previous>This was the original note.</previous>
  This is the updated version.
</NOTE>

For content:

<TITLE>
  <instructions>Summarize in one sentence.</instructions>
  A short summary here
</TITLE>

For attributes:

<LINK 
  HREF="https://example.com" 
  instructions.HREF="Must be absolute URL"/>

📂 Format and Typing

Use FORMAT="markdown" or FORMAT="plaintext" to guide content formatting
Typed fields should use suffixes or explicit attributes:
- *_TIMESTAMP, *_DATE, CURRENCY_USD, PERCENTAGE, etc.

🕵️ Provenance

Use SOURCE="«ID»" to mark origin of content
Sources can be:
- «USER»
- «LLM»
- «WEBPAGE-nytimes.com-51»
Optional: declare full source metadata separately using <SOURCES>

<SOURCES>
  <SOURCE ID="«WEBPAGE-buzzfeed.com-41»" TYPE="external_web" URL="https://buzzfeed.com/article/41"/>
</SOURCES>

🧳 Collapsible Content

<COMPLETE_DESCRIPTION ID="«DESC-32»" COLLAPSED_WITH_CONTENT="30 words"/>

Or:

<COLLAPSED>
  <PREVIEW>A brief teaser...</PREVIEW>
  <FULL ID="«DESC-32»">The complete paragraph of content.</FULL>
</COLLAPSED>

📘 Examples and Testing

Use the lowercase <example> tag to provide demonstration prompts, partials, or tests.

Add purpose="..." to describe the intent of the example.
Use nested <input> and <output> blocks if needed.

<example purpose="Test summarization with user sentiment">
  <input>
    <NOTE ID="«NOTE-77»" SOURCE="«USER»">
      <instructions>Summarize sentiment neutrally</instructions>
      This thing is the worst.
    </NOTE>
  </input>
  <output>
    <SUMMARY ID="«SUMMARY-77»" SOURCE="«LLM»">
      The user was strongly dissatisfied.
    </SUMMARY>
  </output>
</example>

Lowercase tags are reserved for system-level metadata, samples, or scaffolding.

🔁 JSON Function Call Format for Tool Integration

While CTX_MARK uses XML-like markup, tools interacting with LLMs may use structured function calls in JSON. Each update action is a JSON object calling a named function. These updates apply to entire objects identified by ID. Paths and fine-grained selectors are not used.

General Form

update({
  "ID": "«SUMMARY-10»",
  "text": "Updated summary content"
})

Supported Functions

update({...}) — modifies fields of an existing object by ID
delete({ "ID": "..." }) — removes an object by ID
create({...}) — creates a new object with placeholder ID
append({ "ID": "«PARENT»", "child": { ... } }) — adds a child to an existing parent object
expand({ "ID": "«DESC-32»" }) — expands collapsed content

Examples

update({
  "ID": "«SUMMARY-10»",
  "text": "The user found it acceptable but unremarkable."
})

delete({ "ID": "«NOTE-22»" })

create({
  "type": "NOTE",
  "ID": "«NEW_ID_1»",
  "text": "Newly created content"
})

append({
  "ID": "«COMMENTS-4»",
  "child": {
    "type": "NOTE",
    "ID": "«NEW_ID_1»",
    "text": "Appended child"
  }
})

expand({
  "ID": "«DESC-32»"
})

These function patterns are intended to be LLM-friendly and avoid the ambiguity of JSON Patch syntax.

🧪 Functional Testing with CTX_MARK

To support functional testing of prompt behavior, CTX_MARK includes several attribute-level and tag-level conventions for test assertions:

Attribute-level Matching:

criteria.ATTR="..." – natural language description of the expectation
expect.ATTR="..." – exact value must match
pattern.ATTR="..." – a match pattern (e.g., regex or partial match language)
type.ATTR="..." – expected data type (e.g., DATE, NUMBER)
any.ATTR="1" – must exist and have a value
ignore.ATTR="1" – value is irrelevant or optional (or may not be present)

Child Element Matching:

Use lower-case tags like <criteria>, <expect>, <pattern>, <any>, <ignore> as needed to specify expectations for child content.

<NOTE ID="«NOTE-33»">
  <criteria>This should summarize the user’s attitude neutrally.</criteria>
</NOTE>

<LINK HREF="https://example.com" expect.HREF="https://example.com"/>

Tag-level Generalization:

<ignore.TAG /> – ignore all values within the named tag
<all.TAG ignore.TIMESTAMP="1" /> – ignore specific attributes in all matching tags

These conventions allow declarative test scaffolds to validate LLM output.

🧠 Design Goals

Visual clarity for both humans and LLMs
Explicit provenance and structure
Editable memory/reference objects
Zero-dependency XML superset—designed to be parsed loosely

ianb/CTX_MARK.md