Skip to content

Instantly share code, notes, and snippets.

@crmne
Last active April 3, 2025 16:26
Show Gist options
  • Save crmne/301be1d38ff193e7274a69833947139a to your computer and use it in GitHub Desktop.
Save crmne/301be1d38ff193e7274a69833947139a to your computer and use it in GitHub Desktop.
Proposed multi-provider capabilities and pricing API. Blog post: https://paolino.me/standard-api-llm-capabilities-pricing/
# This is a YAML file so I can have comments but the API should obviously return an array of models in JSON.
# Legend:
# Required: this is important to have in v1.
# Optional: this is still important but can wait for v2.
id: gpt-4.5-preview # Required, will match it with the OpenAI API
display_name: GPT-4.5 Preview # Required
provider: openai # Required
family: gpt45 # Optional, each model page is a family for OpenAI models
context_window: 128000 # Required
max_output_tokens: 16384 # Required
knowledge_cutoff: 20231001 # Optional
modalities:
text:
input: true # Required
output: true # Required
image:
input: true # Required
output: false # Required
audio:
input: false # Required
output: false # Required
pdf_input: false # Optional - from Anthropic and Google
embeddings_output: false # Required
moderation_output: false # Optional
capabilities:
streaming: true # Optional
function_calling: true # Required
structured_output: true # Required
predicted_outputs: false # Optional
distillation: false # Optional
fine_tuning: false # Optional
batch: true # Required
realtime: false # Optional
image_generation: false # Required
speech_generation: false # Required
transcription: false # Required
translation: false # Optional
citations: false # Optional - from Anthropic
reasoning: false # Optional - called Extended Thinking in Anthropic's lingo
pricing:
text_tokens:
standard:
input_per_million: 75.0 # Required
cached_input_per_million: 37.5 # Required
output_per_million: 150.0 # Required
reasoning_output_per_million: 0 # Optional
batch:
input_per_million: 37.5 # Required
output_per_million: 75.0 # Required
images:
standard:
input: 0.0 # Optional
output: 0.0 # Optional
batch:
input: 0.0 # Optional
output: 0.0 # Optional
audio_tokens:
standard:
input_per_million: 0.0 # Optional
output_per_million: 0.0 # Optional
batch:
input_per_million: 0.0 # Optional
output_per_million: 0.0 # Optional
embeddings:
standard:
input_per_million: 0.0 # Required
batch:
input_per_million: 0.0 # Required
@MadBomber
Copy link

I generally get this kind of information from open router's (OR) model API. One of the areas where I think there payload is weak is in the seperation of model capabilities. I like the way you are proposing to sperate capabilities into individual text, image and audio. For multi-modal models I would expect that each "mode" is true. You might consider capabilities as being bi-direction. for example text->image and/or image->text. same with the audio. Does it support audio in and text out (transcription) and/or text in and audio out (text to speech.

One of the areas where I think the OR payload is strong is in the model description. Here is an example:

- :id: openai/o3-mini-high
  :name: 'OpenAI: o3 Mini High'
  :created: 1739372611
  :description: "OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini)
    with reasoning_effort set to high. \n\no3-mini is a cost-efficient language model
    optimized for STEM reasoning tasks, particularly excelling in science, mathematics,
    and coding. The model features three adjustable reasoning effort levels and supports
    key developer capabilities including function calling, structured outputs, and
    streaming, though it does not include vision processing capabilities.\n\nThe model
    demonstrates significant improvements over its predecessor, with expert testers
    preferring its responses 56% of the time and noting a 39% reduction in major errors
    on complex questions. With medium reasoning effort settings, o3-mini matches the
    performance of the larger o1 model on challenging reasoning evaluations like AIME
    and GPQA, while maintaining lower latency and cost."
  :context_length: 200000
  :architecture:
    modality: text->text
    tokenizer: Other
    instruct_type:
  :pricing:
    prompt: '0.0000011'
    completion: '0.0000044'
    image: '0'
    request: '0'
  :top_provider:
    context_length: 200000
    max_completion_tokens: 100000
    is_moderated: true
  :per_request_limits:

@crmne
Copy link
Author

crmne commented Apr 2, 2025

Thank you so much for the info @MadBomber!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment