Skip to content

Instantly share code, notes, and snippets.

@steipete
Last active September 30, 2025 09:42
Show Gist options
  • Save steipete/b58f0087c02fd97cea73f016e42c8ac0 to your computer and use it in GitHub Desktop.
Save steipete/b58f0087c02fd97cea73f016e42c8ac0 to your computer and use it in GitHub Desktop.

Perfect—let’s make the whole guide Swift‑native and wire it to gpt-5-codex as the default model. This version shows the Responses API tool‑calling loop, built‑in tools (web search), custom tools, structured outputs (text.format), and streaming.

Why gpt-5-codex? It’s a GPT‑5 variant optimized for agentic coding and is Responses‑only—built to drive coding agents (think Codex/CLI/IDE workflows). (OpenAI) Structured outputs + function/tool calling + previous_response_id are the core building blocks in Responses. (OpenAI Platform) Prompting tips for this model differ slightly from plain GPT‑5; the cookbook notes it’s Responses‑only and has a few behavior differences. (OpenAI Cookbook)


TL;DR setup

  • Use gpt-5-codex with Responses API. Put your key in OPENAI_API_KEY. (OpenAI Platform)

  • Tools come in two flavors:

    • Built‑ins like web_search (and others) you enable in the tools array; you can request sources with include. (OpenAI Platform)
    • Custom functions you define via JSON Schema; model returns function_call items with a call_id and arguments. You run the tool, then send back function_call_output with that call_id. (OpenAI Platform)
  • Multi‑turn/state: chain requests with previous_response_id so the model can continue reasoning with prior items. (OpenAI Platform)

  • Strict JSON: use Structured Outputs via text.format or strict function schemas. (OpenAI Platform)

  • Streaming: enable SSE to receive deltas and tool‑call events. (OpenAI Platform)


1) Swift package bootstrap

Package.swift

// swift-tools-version: 6.0
import PackageDescription

let package = Package(
  name: "SwiftCodexAgent",
  platforms: [.macOS(.v13)],
  products: [.executable(name: "codex-agent", targets: ["CodexAgent"])],
  targets: [.executableTarget(name: "CodexAgent", path: "Sources")]
)

Environment

  • export OPENAI_API_KEY=sk-...

2) Minimal Responses client (REST + SSE)

Sources/OpenAIResponsesClient.swift

import Foundation

struct SSEEvent { let event: String?; let data: String }

final class OpenAIResponsesClient {
    private let apiKey: String
    private let base = URL(string: "https://api.openai.com/v1/responses")!

    init(apiKey: String = ProcessInfo.processInfo.environment["OPENAI_API_KEY"] ?? "") {
        precondition(!apiKey.isEmpty, "Set OPENAI_API_KEY")
        self.apiKey = apiKey
    }

    // One-shot request
    func create(body: [String: Any]) async throws -> [String: Any] {
        var req = URLRequest(url: base)
        req.httpMethod = "POST"
        req.setValue("application/json", forHTTPHeaderField: "Content-Type")
        req.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
        req.httpBody = try JSONSerialization.data(withJSONObject: body, options: [])

        let (data, resp) = try await URLSession.shared.data(for: req)
        guard let http = resp as? HTTPURLResponse, (200..<300).contains(http.statusCode) else {
            throw NSError(domain: "OpenAI", code: (resp as? HTTPURLResponse)?.statusCode ?? -1,
                          userInfo: ["response": String(data: data, encoding: .utf8) ?? ""])
        }
        return (try JSONSerialization.jsonObject(with: data)) as? [String: Any] ?? [:]
    }

    // Streaming (SSE)
    func stream(body: [String: Any], onEvent: @escaping (SSEEvent) -> Void) async throws {
        var req = URLRequest(url: base)
        req.httpMethod = "POST"
        req.setValue("application/json", forHTTPHeaderField: "Content-Type")
        req.setValue("text/event-stream", forHTTPHeaderField: "Accept")
        req.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")

        var b = body; b["stream"] = true
        req.httpBody = try JSONSerialization.data(withJSONObject: b, options: [])

        let (bytes, resp) = try await URLSession.shared.bytes(for: req)
        guard let http = resp as? HTTPURLResponse, (200..<300).contains(http.statusCode) else {
            throw NSError(domain: "OpenAI-SSE", code: (resp as? HTTPURLResponse)?.statusCode ?? -1)
        }

        var event: String?; var data = ""
        for try await line in bytes.lines {
            if line.hasPrefix("event:") {
                event = line.dropFirst("event:".count).trimmingCharacters(in: .whitespaces)
            } else if line.hasPrefix("data:") {
                let d = line.dropFirst("data:".count).trimmingCharacters(in: .whitespaces)
                data += d + "\n"
            } else if line.isEmpty, !data.isEmpty {
                onEvent(SSEEvent(event: event, data: data))
                event = nil; data = ""
            }
        }
    }
}
  • Streaming uses SSE; you’ll see deltas and tool‑call events as they happen. (OpenAI Platform)

3) Tool protocol + registry (your custom functions)

Sources/Tools.swift

import Foundation

protocol Tool {
    var name: String { get }
    var about: String { get }
    var parameters: [String: Any] { get } // JSON Schema
    func call(with args: [String: Any]) async throws -> Any
}

final class ToolRegistry {
    private var tools: [String: Tool] = [:]
    func register(_ t: Tool) { tools[t.name] = t }
    func get(_ name: String) -> Tool? { tools[name] }

    /// Compose the "tools" array for Responses:
    /// includeBuiltIns e.g. ["web_search"] (platform-hosted)
    func spec(includeBuiltIns: [String] = []) -> [[String: Any]] {
        var out: [[String: Any]] = includeBuiltIns.map { ["type": $0] }
        for t in tools.values {
            out.append([
                "type": "function",
                "name": t.name,
                "description": t.about,
                "parameters": t.parameters,
                "strict": true // schema-enforced arguments
            ])
        }
        return out
    }
}
  • Built‑ins (like web_search) are declared by type; custom tools are type: "function" with a JSON Schema and (optionally) strict: true. (OpenAI Platform)

4) A couple of custom tools for coding agents

Sources/CustomTools.swift

import Foundation

/// Very simple "save a file" tool to let the agent create code/test files.
struct SaveFileTool: Tool {
    let name = "save_file"
    let about = "Write a UTF-8 text file into a workspace directory."
    let workspace = FileManager.default.temporaryDirectory.appendingPathComponent("codex_workspace")

    var parameters: [String: Any] {
        ["type": "object",
         "properties": [
            "path": ["type": "string", "description": "Relative path under workspace, e.g. src/main.swift"],
            "content": ["type": "string", "description": "File content (UTF-8)"]
         ],
         "required": ["path", "content"],
         "additionalProperties": false]
    }

    func call(with args: [String: Any]) async throws -> Any {
        let rel = (args["path"] as? String) ?? "untitled.txt"
        let content = (args["content"] as? String) ?? ""
        try FileManager.default.createDirectory(at: workspace, withIntermediateDirectories: true)
        let dest = workspace.appendingPathComponent(rel)
        try FileManager.default.createDirectory(at: dest.deletingLastPathComponent(), withIntermediateDirectories: true)
        try content.data(using: .utf8)?.write(to: dest)
        return ["saved": dest.path]
    }
}

/// Dummy "run tests" tool: simulates a test run with results.
/// In a real agent, you'd invoke `swift test` (or your runner) in a sandbox.
struct RunTestsTool: Tool {
    let name = "run_tests"
    let about = "Execute unit tests in the workspace and return a summary."

    var parameters: [String: Any] {
        ["type": "object",
         "properties": [
           "command": ["type": "string", "description": "Ignored in demo; default swift test"]
         ],
         "additionalProperties": false]
    }

    func call(with args: [String: Any]) async throws -> Any {
        // Simulate a pass:
        return ["passed": true, "tests": 3, "failures": 0, "duration_ms": 1420]
    }
}

5) The agent loop (handles tool‑calls, batching outputs, continuation)

Sources/Agent.swift

import Foundation

struct FunctionCall { let callId: String; let name: String; let arguments: [String: Any] }

final class Agent {
    private let client: OpenAIResponsesClient
    private let model: String
    private let tools: ToolRegistry
    private let includeBuiltIns: [String]

    init(client: OpenAIResponsesClient, model: String, tools: ToolRegistry, includeBuiltIns: [String] = []) {
        self.client = client; self.model = model; self.tools = tools; self.includeBuiltIns = includeBuiltIns
    }

    @discardableResult
    func ask(_ prompt: String,
             forceToolChoice: String? = nil,
             structuredJSONSchema: [String: Any]? = nil,
             includeWebSources: Bool = false) async throws -> String {

        var previousId: String?
        var finalText = ""
        var toolsSpec = tools.spec(includeBuiltIns: includeBuiltIns)

        var toolChoice: Any = "auto"
        if let forced = forceToolChoice { toolChoice = ["type": "function", "name": forced] } // or "web_search"

        var textOpts: [String: Any]?
        if let schema = structuredJSONSchema {
            textOpts = ["format": [
                "type": "json_schema",
                "json_schema": [
                    "name": schema["name"] ?? "Result",
                    "schema": schema["schema"] ?? [:]
                ],
                "strict": true
            ]]
        }

        var body: [String: Any] = [
            "model": model,
            "input": prompt,           // can also be an array of items
            "tools": toolsSpec,
            "tool_choice": toolChoice
        ]
        if let textOpts { body["text"] = textOpts }
        if includeWebSources {
            // Ask Responses to include sources for web_search calls:
            body["include"] = ["web_search_call.action.sources"]
        }

        var resp = try await client.create(body: body)
        previousId = resp["id"] as? String
        finalText += extractText(resp)

        // Loop while there are function calls
        var safety = 0
        while true {
            let calls = functionCalls(resp)
            if calls.isEmpty { break }

            let outputs: [[String: Any]] = try await withThrowingTaskGroup(of: [String: Any].self) { group in
                for c in calls {
                    group.addTask {
                        guard let tool = self.tools.get(c.name) else {
                            return ["type": "function_call_output", "call_id": c.callId, "output": "{\"error\":\"unknown tool\"}"]
                        }
                        let result = try await tool.call(with: c.arguments)
                        let data = try JSONSerialization.data(withJSONObject: result, options: [])
                        let json = String(data: data, encoding: .utf8) ?? "\"\""
                        return ["type": "function_call_output", "call_id": c.callId, "output": json]
                    }
                }
                var outs: [[String: Any]] = []
                for try await o in group { outs.append(o) }
                return outs
            }

            var cont: [String: Any] = [
                "model": model,
                "previous_response_id": previousId as Any,
                "input": outputs,
                "tools": toolsSpec,
                "tool_choice": "auto"
            ]
            if let textOpts { cont["text"] = textOpts }
            if includeWebSources { cont["include"] = ["web_search_call.action.sources"] }

            resp = try await client.create(body: cont)
            previousId = resp["id"] as? String
            finalText += extractText(resp)

            safety += 1; if safety > 8 { break }
        }

        return finalText.trimmingCharacters(in: .whitespacesAndNewlines)
    }

    private func functionCalls(_ resp: [String: Any]) -> [FunctionCall] {
        guard let output = resp["output"] as? [[String: Any]] else { return [] }
        var out: [FunctionCall] = []
        for item in output {
            guard (item["type"] as? String) == "function_call" else { continue }
            let callId = (item["call_id"] as? String) ?? UUID().uuidString
            let name = (item["name"] as? String) ?? ""
            var args: [String: Any] = [:]
            if let argStr = item["arguments"] as? String,
               let data = argStr.data(using: .utf8),
               let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any] {
                args = json
            }
            out.append(FunctionCall(callId: callId, name: name, arguments: args))
        }
        return out
    }

    private func extractText(_ resp: [String: Any]) -> String {
        guard let output = resp["output"] as? [[String: Any]] else { return "" }
        var text = ""
        for item in output {
            guard let t = item["type"] as? String else { continue }
            if t == "output_text" { text += (item["text"] as? String) ?? "" }
            if t == "message",
               let segments = item["content"] as? [[String: Any]] {
                for seg in segments where (seg["type"] as? String) == "output_text" {
                    text += (seg["text"] as? String) ?? ""
                }
            }
        }
        return text
    }
}
  • Tool calls arrive as function_call items; you must echo each call’s call_id when sending function_call_output in the next request, paired with previous_response_id. (OpenAI Platform)
  • input can be a string or an array of items; the code keeps it simple with a string on turn 1, then an array of outputs on continuation. (OpenAI Platform)
  • tool_choice lets you keep "auto" or force a specific tool for determinism. (OpenAI Platform)
  • If you enable web_search, you can ask Responses to include citations/sources with include: ["web_search_call.action.sources"]. (OpenAI Platform)

6) Run the agent (with gpt-5-codex)

Sources/main.swift

import Foundation

@main
struct CodexAgentMain {
    static func main() async throws {
        let client = OpenAIResponsesClient()

        let registry = ToolRegistry()
        registry.register(SaveFileTool())
        registry.register(RunTestsTool())

        // Add built-ins if you want the model to browse while coding.
        let agent = Agent(
            client: client,
            model: "gpt-5-codex",
            tools: registry,
            includeBuiltIns: ["web_search"] // optional; remove if you want offline-only
        )

        // A coding-style instruction tailored for GPT-5-Codex
        // (Model-specific prompting guidance lives in the cookbook.)
        // Ask it to create a Fibonacci implementation + tests and persist to files, then run tests.
        let answer = try await agent.ask(
            """
            Create a Swift module with:
            - src/Fibonacci.swift exporting fibonacci(n: Int) -> Int
            - Tests/FibonacciTests.swift with 3 unit tests
            Save files using the save_file tool, then run tests with run_tests.
            If web results help, cite sources.
            """,
            includeWebSources: true
        )

        print("\n=== FINAL ===\n\(answer)\n")
    }
}
  • gpt-5-codex is Responses‑only, tuned for coding agents (CLI/IDE/Codex). (OpenAI Platform)
  • For prompt style specific to this model (e.g., more explicit developer instructions), see the GPT‑5‑Codex prompting guide. (OpenAI Cookbook)

7) Structured outputs (two ways you’ll actually use)

  • Strict function args (we set "strict": true in tool specs). Great for deterministic tool invocation. (OpenAI Platform)
  • Final JSON via text.format (recommended when the answer should be parseable JSON).

Example: ask for a machine‑readable test report.

let testReportSchema: [String: Any] = [
  "name": "TestReport",
  "schema": [
    "type": "object",
    "properties": [
      "passed": ["type": "boolean"],
      "tests": ["type": "integer"],
      "failures": ["type": "integer"],
      "duration_ms": ["type": "integer"]
    ],
    "required": ["passed","tests","failures"]
  ]
]

let structured = try await agent.ask(
  "Return the latest test run summary as JSON only.",
  structuredJSONSchema: testReportSchema
)
print(structured)
  • In Responses, structured text output is configured under text.format with type: "json_schema" (this supersedes older response_format patterns). (OpenAI Platform)

8) Streaming (optional, handy for IDE/CLI UX)

let body: [String: Any] = [
  "model": "gpt-5-codex",
  "input": "Draft a minimal README for the Fibonacci module.",
  "tools": [["type": "web_search"]],
  "tool_choice": "auto",
  "include": ["web_search_call.action.sources"]
]

try await OpenAIResponsesClient().stream(body: body) { ev in
    if let e = ev.event { print("EVENT:", e) }
    print(ev.data) // JSON envelope chunks (text deltas, tool calls, etc.)
}
  • SSE streaming delivers token/tool events as they happen—perfect for “live coding” UI. (OpenAI Platform)

9) Patterns you’ll reuse (and pitfalls)

  • Multi‑tool turn: model may emit multiple function_call items—execute them, then send one batch of function_call_output items in input + the previous_response_id. (OpenAI Platform)
  • Force determinism: set tool_choice to a specific function or built‑in when needed (e.g., always run_tests before answering). (OpenAI Platform)
  • State: previous_response_id chains turns; instructions from older turns aren’t auto‑reapplied unless you pass them again or depend on the model’s reasoning items from that ID. (OpenAI Platform)
  • Web search: if enabled, you can pull back sources via include for auditability. (OpenAI Platform)
  • Prompting for gpt-5-codex: favor concise developer directives and explicit tool usage steps; cookbook notes it’s not a drop‑in for GPT‑5 and that some UI params differ. (OpenAI Cookbook)

10) Minimal request shapes (cheat sheet)

  • Basic
{ "model": "gpt-5-codex", "input": "hello" }
  • Built‑in web search
{
  "model": "gpt-5-codex",
  "input": "What Swift version ships with Xcode 16.1? Cite sources.",
  "tools": [{ "type": "web_search" }],
  "tool_choice": "auto",
  "include": ["web_search_call.action.sources"]
}

(OpenAI Platform)

  • Custom function definition
{
  "type": "function",
  "name": "save_file",
  "description": "Write a UTF-8 text file.",
  "parameters": {
    "type": "object",
    "properties": { "path": { "type": "string" }, "content": { "type": "string" } },
    "required": ["path","content"],
    "additionalProperties": false
  },
  "strict": true
}

(OpenAI Platform)

  • Continue after tool calls
{
  "model": "gpt-5-codex",
  "previous_response_id": "<id from prior call>",
  "input": [
    { "type": "function_call_output", "call_id": "<call_id_1>", "output": "{\"saved\":\"/tmp/.../Fibonacci.swift\"}" },
    { "type": "function_call_output", "call_id": "<call_id_2>", "output": "{\"passed\":true,\"tests\":3}" }
  ]
}

(OpenAI Platform)

  • Structured JSON answer
{
  "model": "gpt-5-codex",
  "input": "Return a JSON test summary only.",
  "text": {
    "format": {
      "type": "json_schema",
      "json_schema": { "name": "TestReport", "schema": { "type": "object", "properties": { "passed": { "type": "boolean" } }, "required": ["passed"] } },
      "strict": true
    }
  }
}

(OpenAI Platform)


References (official)

  • Responses API reference (request shape, parameters, streaming) (OpenAI Platform)
  • Function/Tool calling flow (including function_call_output + call_id) (OpenAI Platform)
  • Conversation state / previous_response_id (OpenAI Platform)
  • Built‑in web_search (tools guide + sources via include) (OpenAI Platform)
  • Structured Outputs (text.format + JSON Schema) & migration notes (OpenAI Platform)
  • gpt-5-codex model (what it is, availability) + system‑card addendum + launch post (OpenAI Platform)

Want me to spin this into a tiny SPM repo scaffold (swift run codex-agent) with a workspace folder and a real swift test runner tool so you can drive red‑green‑refactor from the model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment