Skip to content

Instantly share code, notes, and snippets.

@egorsmkv
Last active July 30, 2025 14:57
Show Gist options
  • Save egorsmkv/6666701990e44e622eb72a896a8b22ab to your computer and use it in GitHub Desktop.
Save egorsmkv/6666701990e44e622eb72a896a8b22ab to your computer and use it in GitHub Desktop.
How function calling looks like in training data for LLMs?

For Large Language Models (LLMs) to learn function calling, the training data typically includes examples that demonstrate the pattern of calling a function. This can be represented in various formats, but a common approach is to use a structured format that resembles a dialogue or a sequence of instructions where a "user" or "assistant" invokes a function with specific arguments. The representation might vary depending on the specific LLM and its intended application, but here's a general idea of how it might look:

Example Format for Training Data

  1. Text-based representation: In this format, function calls are represented as text that the model learns to predict or generate. For instance, if the task involves calling a function get_weather(city), the training data might include examples like:

    User: What is the weather like in Paris?
    Assistant: I need to call get_weather(city="Paris"). The weather in Paris is sunny.
    

    or more directly:

    get_weather(city="Paris")
    Result: The weather in Paris is sunny.
    
  2. JSON or structured data format: For models that are fine-tuned on specific tasks or APIs, the training data might be represented in a structured format like JSON. For example:

    {
      "function_name": "get_weather",
      "args": {"city": "Paris"},
      "result": "The weather in Paris is sunny."
    }

    This format is particularly useful for training models that need to understand and generate structured outputs related to function calls.

  3. Dialogue or conversational format: For conversational AI models, the training data is often formatted as a dialogue between a user and an assistant, where the assistant might invoke a function. For instance:

    User: Can you tell me the weather in New York?
    Assistant: <function_call>get_weather(city="New York")</function_call>
    System: <function_result>The weather in New York is rainy.</function_result>
    

Key Elements for Function Calling in Training Data

  • Function name: Clearly indicates the name of the function to be called.
  • Arguments: Specifies the parameters or arguments required for the function call.
  • Result or output: Provides the expected result or output of the function call, which helps the model learn the outcome of invoking the function.

By including a variety of such examples in the training data, LLMs can learn to recognize and generate appropriate function calls based on the input they receive, enabling them to interact with external systems or APIs in a meaningful way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment