GitHub Issue: Enable Schema-Formatted Output for Tool-Using Chains
1. Problem:
Currently, the llm
CLI's --schema
option applies to the direct output of a single Language Model (LLM) call. When tools (via --tool
or --functions
) are used, the LLM engages in a multi-step chain (e.g., ReAct pattern) where intermediate outputs are tool call requests or textual reasoning. There's no direct way to specify that the final, user-visible result of such a multi-step, tool-using chain should conform to a user-defined schema. The existing --schema
option doesn't automatically apply to the culmination of this chain.
2. Alternatives Considered:
- A. New CLI Option: Introducing a distinct option (e.g.,
--final-schema
or--output-schema
) specifically for specifying the schema of the final output after a tool chain. This would keep the existing--schema
behavior for direct, single-turn schema output and make the post-chain formatting explicit. - B. Overload Existing
--schema
(Implicit Deferral): Modify the behavior of the existing--schema
option. When tools are also specified,--schema
would be implicitly understood to apply to the final output of the entire tool chain, requiring an additional LLM call at the end to format the accumulated information. If no tools are present,--schema
behaves as it does currently.
3. Suggested Approach (from discussion):
The preferred approach discussed was Alternative B: Overload Existing --schema
. The rationale is that from a user's perspective, if they ask for tools and a schema, their intent is for the final output of the entire operation to be schema-formatted. This approach aims for a simpler user interface by reusing the existing --schema
flag, with llm
handling the "when" of schema application.
4. Edge Cases and Challenges (with the suggested approach):
- Clarity of Conditional Behavior: Users must understand that
--schema
behaves differently (becomes a post-processing step) when tools are active. - Context for Final Formatting: Determining the optimal context (full history vs. last utterance vs. summary) to feed into the final schema-formatting LLM call is crucial for quality and token efficiency.
- Token Limits & Cost: The additional (+1) LLM call for formatting adds latency and cost, and the context for this call could be large.
- Streaming UX: While the tool chain might stream intermediate thoughts/actions, the final schema-formatted (especially JSON) output will likely be a block.
- Error Handling: Failures in the implicit final formatting step need to be clearly distinguishable from errors in the tool-using chain.
- Debugging: Users may need a way to inspect the context passed to the implicit final formatting call if the output is unsatisfactory.
- Prompt Engineering for Final Call: The internal prompt used by
llm
for the final formatting call is fixed; users cannot easily tune it. chain_limit
Interaction: If the tool chain is cut short bychain_limit
, the context for final formatting might be incomplete or suboptimal.
5. Preferred Solution Attributes/Components (High-Level Design for Implicit Deferral):
This section outlines the conceptual components and areas of existing code that would need modification, using only existing llm
class/method names where applicable.
-
Deferred Schema Application Logic:
- Core Change Location: The primary logic for deferring and then applying the schema would reside within the
llm.models.ChainResponse
(and its asynchronous counterpartllm.models.AsyncChainResponse
). - Detection: These classes would detect if both tools are active (via
llm.models.Prompt.tools
) and a schema is provided (passed fromllm.cli.prompt
to thellm.models.Model.chain
method, and then toChainResponse
).
- Core Change Location: The primary logic for deferring and then applying the schema would reside within the
-
Chain Execution Modification:
- Tool-Using Phase: The existing loop within
ChainResponse
that handles tool calls (iterating throughResponse
objects, executing tools viaResponse.execute_tool_calls
, and feeding results back) would run as usual, up to thechain_limit
or until the model stops calling tools. - Context Aggregation: After the tool-using phase,
ChainResponse
would internally aggregate a comprehensive context string from all precedingllm.models.Response
objects within the chain (including user prompts, system prompts, tool calls, tool results, and intermediate assistant textual outputs).
- Tool-Using Phase: The existing loop within
-
Final Formatting Call:
- Trigger: If a schema was deferred,
ChainResponse
initiates a new, final LLM call. - Prompt Construction: An internal, standardized prompt will be constructed (e.g., "Based on the following interaction, format the final answer into the schema: {schema_description}. Interaction: {aggregated_context}").
- Execution: This uses the existing
llm.models.Model.prompt
(orllm.models.Conversation.prompt
) method of the same model instance used for the chain, but this time with theschema
parameter set to the user's originally supplied (and deferred) schema. Tools will not be enabled for this specific formatting call.
- Trigger: If a schema was deferred,
-
Output Handling:
- Streaming: If the original
llm
command requested streaming, the chunks from this final formatting call will be yielded. If not, its complete text (which should be the schema-formatted string) will be returned byChainResponse.text()
. - Intermediate Output: Consideration is needed for how/if intermediate thoughts/actions from the tool-chain are streamed to the user if the final output is a single schema block. The current design suggests streaming intermediates if the final output is also streamed.
- Streaming: If the original
-
Logging Integration:
- Additional
Response
: The final schema-formatting LLM call must be logged as an additionalllm.models.Response
object within the samellm.models.Conversation
(sharing theconversation_id
). - Distinction: The
llm.models.Prompt
for this logged formatting response should clearly reflect its purpose (e.g., itsprompt.prompt
text would be the internal formatting instruction).
- Additional
-
CLI Layer (
llm.cli.prompt
function):- Option Passing: The
llm.cli.prompt
function will continue to parse the--schema
option. This schema object will be passed tollm.models.Model.chain
(orllm.models.Conversation.chain
). The decision to use it immediately or defer it happens within the model layer based on whether tools are also active. chain_limit
: The existing--chain-limit
applies to the tool-using phase. The final formatting call is an additional (+1) call, effectively outside this limit.
- Option Passing: The
Simon asked on Discord:
Maybe could do it by having a bunch of earlier tool calls that just dump information into context and then a final reply that schemas it up?
Yes, Simon's suggestion is a very practical way to think about the flow and aligns well with the "implicit deferral" approach we've been discussing. Let's break down what that implies for the design and how it maps to the components.
Simon's Suggestion Deconstructed:
-
"A bunch of earlier tool calls that just dump information into context":
- This accurately describes the ReAct or general tool-using loop. The model reasons, decides it needs information, calls a tool.
llm
executes the tool.- The tool's output (the "dumped information") is fed back to the model as an "Observation" or
ToolResult
. This becomes part of the ongoing conversation history (the "context"). - This loop repeats, accumulating information in the conversation history.
-
"and then a final reply that schemas it up":
- This is the crucial "+1" step.
- Once the model (or
chain_limit
) decides the information-gathering phase is complete, it would naturally produce a final textual reply based on the accumulated context. - The modification here is: Instead of just taking that final textual reply directly, if a
--schema
was specified alongside tools,llm
would intercept. It would take the accumulated context (which includes all tool interactions and the model's intermediate thoughts/replies) and then issue a new, specific instruction to the LLM: "Now, take all this information and format it according to this schema."
How this aligns with the "Implicitly Deferred Schema" Design:
This is essentially the same core idea:
-
Tool-Using Phase:
llm.models.ChainResponse
manages the loop.- Each
llm.models.Response
within this phase logs its prompt, any tool calls it makes, and any textual output. llm.models.Prompt
objects for subsequent turns in the chain containtool_results
from previous steps.- This continues until
chain_limit
or the model provides a textual answer without requesting more tools.
-
Context Aggregation (Implicit):
- The
llm.models.Conversation
object (or the history maintained withinChainResponse
) inherently stores this sequence of prompts, tool interactions, and textual replies. This is the "context" Simon refers to. - The
_gather_context_for_final_formatting()
method we designed earlier would be responsible for serializing this conversation history into a string suitable for the final LLM call.
- The
-
Final Schema-Formatting Call (The "Final Reply that Schemas It Up"):
llm.models.ChainResponse
, upon completion of the tool-using phase and if a--schema
was initially provided (now acting as a_deferred_schema
), triggers this.- It makes a new call to
llm.models.Model.prompt()
:- Prompt: "Based on the preceding conversation history: {aggregated_context}, provide the information structured according to the following schema: {schema_definition}."
- Schema: The user's original
--schema
argument. - No Tools: Tools are not enabled for this specific formatting call. Its sole job is structuring.
Key Points Arising from Simon's Phrasing:
-
"Dumping information into context": This highlights the importance of the
_gather_context_for_final_formatting()
method. It needs to be good at creating a useful textual representation of the multi-turn interaction for the final formatting model. This might involve:- Clear delineation of user queries, assistant thoughts/actions, tool calls, and tool results.
- Potentially summarizing very long histories if they exceed token limits for the final call (though this adds another layer of complexity/cost).
-
"Final reply that schemas it up": This reinforces that it's a distinct step. The model first completes its reasoning/tool-using process (potentially ending with a natural language summary), and then that output (or the whole context leading to it) is subject to a separate schema-enforcing instruction.
Refinement of _gather_context_for_final_formatting()
in light of Simon's phrasing:
The method should aim to create a coherent narrative of the interaction. Instead of just a raw dump, it might look more like this pseudo-format:
User: Initial query...
Assistant (Thought): I need to do X.
Assistant (Action): Tool_A[param=value]
Tool Result (Tool_A): Output of Tool_A
Assistant (Thought): Based on Tool_A, I now know Y. I need to do Z.
Assistant (Action): Tool_B[param=value2]
Tool Result (Tool_B): Output of Tool_B
Assistant: The final answer based on my investigation is [natural language summary, if the model produced one before being intercepted for schema formatting].
Then the prompt for the final schema call would be:
"Given the above interaction, extract and structure the pertinent information into the following JSON schema: {...schema...}
. If the model provided a final natural language summary, prioritize information from that summary, using the preceding interaction for detail and context."
This way, if the tool-using chain did culminate in a good textual answer, that answer is heavily weighted in the formatting step, rather than the formatting model having to re-derive everything from scratch from the whole log.
No fundamental change to the previously outlined design, but Simon's phrasing helps clarify the purpose and input for that final (+1) schema-formatting call. The "context" is indeed the history of the tool interactions and model utterances, and the "final reply" is the output of instructing the model to process that context against the schema.
Okay, to provide Simon with the most focused and helpful starting point, I'll aim for minimal but illustrative code snippets. The goal is to show the intent and the key touchpoints within the existing llm
structure, rather than a fully polished, production-ready implementation.
Here's a breakdown of the minimal code snippets, focusing on the core logic for the "implicitly deferred schema" approach:
Core Idea:
If Model.chain()
(or Conversation.chain()
) is called with tools enabled and a schema
is provided, that schema is stored. After the regular tool-using chain finishes, an additional LLM call is made using this stored schema and the accumulated context.
I. llm/models.py
- Modifications to _BaseChainResponse
, ChainResponse
, and AsyncChainResponse
This is where the bulk of the new logic will reside.
# llm/models.py
# ... (existing imports) ...
import json # Ensure json is imported
# ... (Prompt, Response, Tool, etc. classes as they exist) ...
class _BaseChainResponse:
# ... (existing attributes like prompt, model, stream, conversation, _key, _responses) ...
_deferred_schema: Optional[Union[Dict, type[BaseModel]]] = None # NEW attribute
_final_formatting_response: Optional[Any] = None # NEW: To store the +1 response
def __init__(
self,
prompt: Prompt,
model: "_BaseModel",
stream: bool,
conversation: _BaseConversation,
# ... (existing parameters: key, chain_limit, before_call, after_call) ...
# ADD new parameter:
deferred_schema: Optional[Union[Dict, type[BaseModel]]] = None,
):
# ... (existing __init__ body) ...
self.prompt = prompt # Contains original tools and options, but initial schema might be None
self.model = model
self.stream = stream
self._key = key
self._responses = []
self._final_formatting_response = None
self.conversation = conversation
self.chain_limit = chain_limit
self.before_call = before_call
self.after_call = after_call
self._deferred_schema = deferred_schema # Store the schema intended for final output
def _gather_context_for_final_formatting(self) -> str:
"""
Minimal example: concatenates previous prompts and textual responses.
Needs refinement for clarity, tool calls/results, and token limits.
"""
context_parts = []
if not self._responses: # Should ideally not happen if we reached this stage
return "No previous interactions recorded for formatting."
for r_idx, response_obj in enumerate(self._responses):
if response_obj.prompt.system:
context_parts.append(f"System (Turn {r_idx+1}): {response_obj.prompt.system}")
if response_obj.prompt.prompt:
context_parts.append(f"User (Turn {r_idx+1}): {response_obj.prompt.prompt}")
# Represent tool calls and results concisely
if response_obj.prompt.tool_results: # Results fed INTO this turn
for tr_idx, tr in enumerate(response_obj.prompt.tool_results):
context_parts.append(f" Tool Result {tr_idx+1} (for call ID {tr.tool_call_id}): {tr.output}")
# What the assistant said OR did in this turn
assistant_actions = []
text_output = response_obj.text_or_raise()
tool_calls_made = response_obj.tool_calls_or_raise()
if text_output.strip() and not tool_calls_made: # Purely textual response
assistant_actions.append(text_output)
elif tool_calls_made:
for tc_idx, tc in enumerate(tool_calls_made):
assistant_actions.append(f" Tool Call {tc_idx+1}: {tc.name}({json.dumps(tc.arguments)}) -> ID {tc.tool_call_id}")
if assistant_actions:
context_parts.append(f"Assistant (Turn {r_idx+1}):\n" + "\n".join(assistant_actions))
return "\n---\n".join(context_parts)
def log_to_db(self, db): # Illustrative modification
# Original loop for tool-chain responses
for response_obj in self._responses:
# ... (simplified existing logging logic for response_obj) ...
# Ensure to use response_obj.text_or_raise() and .tool_calls_or_raise()
# to handle potentially un-iterated async responses before logging
actual_response_to_log = response_obj
if isinstance(response_obj, AsyncResponse) and not response_obj._done:
# This is tricky; log_to_db might be called before full async iteration.
# For simplicity in this snippet, we assume it's resolved or needs resolving.
# A robust solution would await/resolve it if needed.
actual_response_to_log = asyncio.run(response_obj.to_sync_response()) # Simplification
elif isinstance(response_obj, Response) and not response_obj._done:
response_obj._force() # Ensure sync response is iterated
actual_response_to_log.log_to_db(db) # Assuming Response.log_to_db exists
# Log the final formatting response if it exists
if self._final_formatting_response:
final_response_to_log = self._final_formatting_response
if isinstance(final_response_to_log, AsyncResponse) and not final_response_to_log._done:
final_response_to_log = asyncio.run(final_response_to_log.to_sync_response())
elif isinstance(final_response_to_log, Response) and not final_response_to_log._done:
final_response_to_log._force()
final_response_to_log.log_to_db(db)
class ChainResponse(_BaseChainResponse):
_responses: List["Response"]
_final_formatting_response: Optional["Response"] = None
def _execute_tool_chain_and_prepare_formatting(self) -> None:
"""
Internal helper to run the tool chain and store intermediate responses.
This method will fully iterate the tool chain.
"""
if self._responses: # Already executed
return
prompt = self.prompt
count = 0
# ... (Logic from existing ChainResponse.responses() generator to iterate tool calls) ...
# Simplified:
current_prompt_obj = self.prompt
while True:
if self.chain_limit and count >= self.chain_limit:
break
response_for_turn = Response(
current_prompt_obj, self.model, self.stream,
conversation=self.conversation, key=self._key
)
self._responses.append(response_for_turn)
# Force iteration to get tool calls and text
for _ in response_for_turn: pass
response_for_turn._force() # Ensure it's fully processed
count += 1
tool_results = response_for_turn.execute_tool_calls(
before_call=self.before_call, after_call=self.after_call
)
if tool_results:
current_prompt_obj = Prompt(
"", self.model, tools=self.prompt.tools, # Pass original tools
tool_results=tool_results, options=self.prompt.options
)
else:
break # No more tool calls, chain ends
def __iter__(self) -> Iterator[str]:
self._execute_tool_chain_and_prepare_formatting() # Ensure tool chain runs
if self._deferred_schema:
context = self._gather_context_for_final_formatting()
formatting_instruction = (
f"Based on the preceding interaction and gathered information:\n{context}\n\n"
f"Please provide the final, consolidated answer strictly according to the following schema."
)
# Use original prompt's options, but override schema and clear tools
final_prompt_options = self.prompt.options
final_formatting_prompt_obj = Prompt(
formatting_instruction,
self.model,
schema=self._deferred_schema,
options=final_prompt_options,
tools=[] # No tools for this final formatting step
)
self._final_formatting_response = Response(
final_formatting_prompt_obj,
self.model,
self.stream, # Use original stream setting
conversation=self.conversation, # For logging
key=self._key,
)
yield from self._final_formatting_response
elif self._responses: # No deferred schema, yield last textual part of tool chain
last_tool_chain_response = self._responses[-1]
# If already streamed by _execute_tool_chain_and_prepare_formatting (if it were a generator)
# this might double-yield. text() is safer.
# The __iter__ needs to be the single source of truth for yielding.
# For simplicity, assume text() gives the final output of the chain here.
# A more robust __iter__ would handle streaming from tool chain OR final step.
# THIS PART IS SIMPLIFIED FOR BREVITY - a real __iter__ is more complex.
if not last_tool_chain_response.tool_calls_or_raise():
yield last_tool_chain_response.text_or_raise()
def text(self) -> str:
# This will run __iter__ and collect all chunks
return "".join(self)
class AsyncChainResponse(_BaseChainResponse): # Illustrative async version
_responses: List["AsyncResponse"]
_final_formatting_response: Optional["AsyncResponse"] = None
async def _execute_tool_chain_and_prepare_formatting(self) -> None:
if self._responses: return
# ... (async equivalent of ChainResponse._execute_tool_chain_and_prepare_formatting) ...
# Iterate using `async for _ in response_for_turn: pass`
# and `await response_for_turn._force()`
# Simplified:
current_prompt_obj = self.prompt
count = 0
while True:
if self.chain_limit and count >= self.chain_limit:
break
response_for_turn = AsyncResponse(
current_prompt_obj, self.model, self.stream,
conversation=self.conversation, key=self._key
)
self._responses.append(response_for_turn)
async for _ in response_for_turn: pass
await response_for_turn._force()
count += 1
tool_results = await response_for_turn.execute_tool_calls(
before_call=self.before_call, after_call=self.after_call
)
if tool_results:
current_prompt_obj = Prompt(
"", self.model, tools=self.prompt.tools,
tool_results=tool_results, options=self.prompt.options
)
else:
break
async def __aiter__(self) -> AsyncIterator[str]:
await self._execute_tool_chain_and_prepare_formatting()
if self._deferred_schema:
context = self._gather_context_for_final_formatting() # Sync part
formatting_instruction = (
f"Based on the preceding interaction and gathered information:\n{context}\n\n"
f"Please provide the final, consolidated answer strictly according to the following schema."
)
final_prompt_options = self.prompt.options
final_formatting_prompt_obj = Prompt(
formatting_instruction,
self.model,
schema=self._deferred_schema,
options=final_prompt_options,
tools=[]
)
self._final_formatting_response = AsyncResponse(
final_formatting_prompt_obj,
self.model,
self.stream,
conversation=self.conversation,
key=self._key,
)
async for chunk in self._final_formatting_response:
yield chunk
elif self._responses:
last_tool_chain_response = self._responses[-1]
if not await last_tool_chain_response.tool_calls(): # tool_calls() is async
async for chunk in last_tool_chain_response: # stream its content
yield chunk
async def text(self) -> str:
chunks = []
async for chunk in self:
chunks.append(chunk)
return "".join(chunks)
# Modify Model.chain() and AsyncModel.chain() (and Conversation variants)
# to pass the `schema` as `deferred_schema` to ChainResponse if tools are active.
class _Model(_BaseModel): # Or _BaseModel if chain is there
# ...
def chain(
self,
prompt: Optional[str] = None,
system: Optional[str] = None,
# ... other existing args ...
schema: Optional[Union[dict, type[BaseModel]]] = None,
tools: Optional[List[ToolDef]] = None, # Make sure `tools` is explicitly here
# ... other existing args ...
) -> ChainResponse: # Or AsyncChainResponse for AsyncModel
# Determine effective tools (passed arguments or conversation's tools)
# This logic might be slightly different for Model.chain vs Conversation.chain
effective_tools = _wrap_tools(tools or getattr(self.conversation if hasattr(self, 'conversation') else None, 'tools', []))
current_prompt_schema = None
deferred_schema_for_chain = None
if effective_tools and schema:
# Tools are active, so this schema is for the *final* output
deferred_schema_for_chain = schema
elif not effective_tools and schema:
# No tools, schema applies to this direct call
current_prompt_schema = schema
# The prompt object passed to ChainResponse should have its 'schema'
# field set to current_prompt_schema (which will be None if deferred).
# The 'tools' field should contain effective_tools.
# The deferred_schema_for_chain is passed separately to ChainResponse constructor.
prompt_obj_for_chain = Prompt(
prompt,
model=self, # self is the model instance
system=system,
schema=current_prompt_schema, # Schema for *this turn* if not deferred
tools=effective_tools, # All available tools
# ... other Prompt fields like attachments, fragments, options ...
options=self.Options(**(options or {})) # Assuming options is passed or available
)
return ChainResponse( # Or AsyncChainResponse
prompt_obj_for_chain, # This is the initial prompt for the chain
model=self,
# ... other existing ChainResponse args like stream, conversation, key ...
deferred_schema=deferred_schema_for_chain, # Pass the potentially deferred schema
)
# Similar for AsyncModel.chain, Conversation.chain, AsyncConversation.chain
II. llm/cli.py
- How prompt
command passes schema
The llm.cli.prompt
function already resolves schema_input
to a schema
dictionary. This schema
dictionary just needs to be correctly passed to the model.chain()
or conversation.chain()
call.
# llm/cli.py
# In the `prompt` command function, where `kwargs_for_chain_or_prompt` is built:
# ... (schema is already resolved from schema_input) ...
# kwargs for the main model call (prompt or chain)
# This 'schema' will be interpreted by Model.chain to become either
# the direct schema for the first turn OR the deferred_schema for ChainResponse.
call_kwargs = {
"prompt": prompt_text_from_user_or_stdin, # The actual prompt string
"system": system,
"attachments": resolved_attachments,
"fragments": resolved_fragments,
"system_fragments": resolved_system_fragments,
"schema": schema, # <--- This is the key part
"tools": tool_implementations, # List of Tool objects/classes
# ... any other relevant parameters for .prompt() or .chain() ...
}
# If tool_implementations exist, we are in a chain scenario.
# Otherwise, it's a direct prompt.
if tool_implementations:
# Add chain-specific parameters if any, e.g., chain_limit
call_kwargs["chain_limit"] = chain_limit
if tools_debug: call_kwargs["after_call"] = _debug_tool_call
if tools_approve: call_kwargs["before_call"] = _approve_tool_call
# Use conversation.chain or model.chain
prompt_method_to_call = conversation.chain if conversation else model.chain
else:
# Use conversation.prompt or model.prompt
prompt_method_to_call = conversation.prompt if conversation else model.prompt
# The `key` and model-specific `options` are often passed as **kwargs
# to the prompt_method_to_call.
# `validated_options` is already prepared in cli.py
response = prompt_method_to_call(
**call_kwargs,
**validated_options, # Contains model-specific options like temperature
key=key_from_cli_or_env # The API key
)
# ...
Explanation of Snippets:
-
_BaseChainResponse
:- Gains
_deferred_schema
and_final_formatting_response
attributes. - The constructor now accepts
deferred_schema
. _gather_context_for_final_formatting()
is a placeholder for the crucial logic of creating a good textual summary of the tool-chain interaction.log_to_db
is updated to also log_final_formatting_response
.
- Gains
-
ChainResponse
(and by extensionAsyncChainResponse
):_execute_tool_chain_and_prepare_formatting()
: This new internal method encapsulates running the original tool-use loop and populatingself._responses
.__iter__()
(and__aiter__
):- First, ensures the tool chain part is executed by calling
_execute_tool_chain_and_prepare_formatting()
. - Then, checks
self._deferred_schema
. - If set, it calls
_gather_context_for_final_formatting()
, creates a new instructionalPrompt
with this context and the_deferred_schema
, and creates/yields fromself._final_formatting_response
(a newResponse
orAsyncResponse
instance). - If no
_deferred_schema
, it (simplified here) yields the text from the last response inself._responses
if that response was purely textual. A more robust version would stream the chunks of the tool chain ifself.stream
is true and no deferred schema is present.
- First, ensures the tool chain part is executed by calling
text()
: Simply joins the output of__iter__()
.
-
_Model.chain()
(andAsyncModel.chain()
etc.):- The
chain()
method (and its variants inConversation
) now intelligently decides: iftools
are active andschema
is also provided, it passes theschema
toChainResponse
asdeferred_schema
. Otherwise (no tools, but schema given), it setsschema
on the initialPrompt
object for direct schema output.
- The
-
llm/cli.py
:- The
prompt
command function passes theschema
(resolved from user input) directly to themodel.chain()
orconversation.chain()
method. The model layer then handles whether it's a direct or deferred schema.
- The
This structure tries to centralize the new "deferred schema" logic within the ChainResponse
classes, keeping the Model.chain()
and cli.py
changes relatively minor (mostly about passing the schema parameter correctly). The _gather_context...
method is the piece that would require the most careful implementation and iteration to ensure good results.