Skip to content

Instantly share code, notes, and snippets.

@bbrowning
Created April 13, 2025 17:30
Show Gist options
  • Save bbrowning/a9d7786e7e71f009eed8b209b4461dad to your computer and use it in GitHub Desktop.
Save bbrowning/a9d7786e7e71f009eed8b209b4461dad to your computer and use it in GitHub Desktop.
Llama Stack OpenAI API Verification Report

Test Results Report

Generated on: 2025-04-13 13:12:49

This report was generated by running python tests/verifications/generate_report.py

Legend

  • ✅ - Test passed
  • ❌ - Test failed
  • ⚪ - Test not applicable or not run for this model

Summary

Provider Pass Rate Tests Passed Total Tests
Together 64.7% 22 34
Fireworks 82.4% 28 34
Groq 61.8% 21 34
Openai 100.0% 24 24
Together-llama-stack 100.0% 34 34
Fireworks-llama-stack 100.0% 34 34
Groq-llama-stack 88.2% 30 34
Openai-llama-stack 100.0% 24 24

Together

Tests run on: 2025-04-13 13:06:42

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -k "test_chat_non_streaming_basic and earth"

Model Key (Together)

Display Name Full Model ID
Llama-3.3-70B-Instruct meta-llama/Llama-3.3-70B-Instruct-Turbo
Llama-4-Maverick-Instruct meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
Llama-4-Scout-Instruct meta-llama/Llama-4-Scout-17B-16E-Instruct
Test Llama-3.3-70B-Instruct Llama-4-Maverick-Instruct Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)
test_chat_non_streaming_basic (saturn)
test_chat_non_streaming_image
test_chat_non_streaming_structured_output (calendar)
test_chat_non_streaming_structured_output (math)
test_chat_non_streaming_tool_calling
test_chat_streaming_basic (earth)
test_chat_streaming_basic (saturn)
test_chat_streaming_image
test_chat_streaming_structured_output (calendar)
test_chat_streaming_structured_output (math)
test_chat_streaming_tool_calling

Fireworks

Tests run on: 2025-04-13 13:07:34

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -k "test_chat_non_streaming_basic and earth"

Model Key (Fireworks)

Display Name Full Model ID
Llama-3.3-70B-Instruct accounts/fireworks/models/llama-v3p3-70b-instruct
Llama-4-Maverick-Instruct accounts/fireworks/models/llama4-maverick-instruct-basic
Llama-4-Scout-Instruct accounts/fireworks/models/llama4-scout-instruct-basic
Test Llama-3.3-70B-Instruct Llama-4-Maverick-Instruct Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)
test_chat_non_streaming_basic (saturn)
test_chat_non_streaming_image
test_chat_non_streaming_structured_output (calendar)
test_chat_non_streaming_structured_output (math)
test_chat_non_streaming_tool_calling
test_chat_streaming_basic (earth)
test_chat_streaming_basic (saturn)
test_chat_streaming_image
test_chat_streaming_structured_output (calendar)
test_chat_streaming_structured_output (math)
test_chat_streaming_tool_calling

Groq

Tests run on: 2025-04-13 13:08:51

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=groq -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=groq -k "test_chat_non_streaming_basic and earth"

Model Key (Groq)

Display Name Full Model ID
Llama-3.3-70B-Instruct llama-3.3-70b-versatile
Llama-4-Maverick-Instruct meta-llama/llama-4-maverick-17b-128e-instruct
Llama-4-Scout-Instruct meta-llama/llama-4-scout-17b-16e-instruct
Test Llama-3.3-70B-Instruct Llama-4-Maverick-Instruct Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)
test_chat_non_streaming_basic (saturn)
test_chat_non_streaming_image
test_chat_non_streaming_structured_output (calendar)
test_chat_non_streaming_structured_output (math)
test_chat_non_streaming_tool_calling
test_chat_streaming_basic (earth)
test_chat_streaming_basic (saturn)
test_chat_streaming_image
test_chat_streaming_structured_output (calendar)
test_chat_streaming_structured_output (math)
test_chat_streaming_tool_calling

Openai

Tests run on: 2025-04-13 13:09:17

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -k "test_chat_non_streaming_basic and earth"

Model Key (Openai)

Display Name Full Model ID
gpt-4o gpt-4o
gpt-4o-mini gpt-4o-mini
Test gpt-4o gpt-4o-mini
test_chat_non_streaming_basic (earth)
test_chat_non_streaming_basic (saturn)
test_chat_non_streaming_image
test_chat_non_streaming_structured_output (calendar)
test_chat_non_streaming_structured_output (math)
test_chat_non_streaming_tool_calling
test_chat_streaming_basic (earth)
test_chat_streaming_basic (saturn)
test_chat_streaming_image
test_chat_streaming_structured_output (calendar)
test_chat_streaming_structured_output (math)
test_chat_streaming_tool_calling

Together-llama-stack

Tests run on: 2025-04-13 13:09:56

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together-llama-stack -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together-llama-stack -k "test_chat_non_streaming_basic and earth"

Model Key (Together-llama-stack)

Display Name Full Model ID
Llama-3.3-70B-Instruct together/meta-llama/Llama-3.3-70B-Instruct-Turbo
Llama-4-Maverick-Instruct together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
Llama-4-Scout-Instruct together/meta-llama/Llama-4-Scout-17B-16E-Instruct
Test Llama-3.3-70B-Instruct Llama-4-Maverick-Instruct Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)
test_chat_non_streaming_basic (saturn)
test_chat_non_streaming_image
test_chat_non_streaming_structured_output (calendar)
test_chat_non_streaming_structured_output (math)
test_chat_non_streaming_tool_calling
test_chat_streaming_basic (earth)
test_chat_streaming_basic (saturn)
test_chat_streaming_image
test_chat_streaming_structured_output (calendar)
test_chat_streaming_structured_output (math)
test_chat_streaming_tool_calling

Fireworks-llama-stack

Tests run on: 2025-04-13 13:10:46

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks-llama-stack -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks-llama-stack -k "test_chat_non_streaming_basic and earth"

Model Key (Fireworks-llama-stack)

Display Name Full Model ID
Llama-3.3-70B-Instruct fireworks/llama-v3p3-70b-instruct
Llama-4-Maverick-Instruct fireworks/llama4-maverick-instruct-basic
Llama-4-Scout-Instruct fireworks/llama4-scout-instruct-basic
Test Llama-3.3-70B-Instruct Llama-4-Maverick-Instruct Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)
test_chat_non_streaming_basic (saturn)
test_chat_non_streaming_image
test_chat_non_streaming_structured_output (calendar)
test_chat_non_streaming_structured_output (math)
test_chat_non_streaming_tool_calling
test_chat_streaming_basic (earth)
test_chat_streaming_basic (saturn)
test_chat_streaming_image
test_chat_streaming_structured_output (calendar)
test_chat_streaming_structured_output (math)
test_chat_streaming_tool_calling

Groq-llama-stack

Tests run on: 2025-04-13 13:11:39

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=groq-llama-stack -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=groq-llama-stack -k "test_chat_non_streaming_basic and earth"

Model Key (Groq-llama-stack)

Display Name Full Model ID
Llama-3.3-70B-Instruct groq/llama-3.3-70b-versatile
Llama-4-Maverick-Instruct groq/llama-4-maverick-17b-128e-instruct
Llama-4-Scout-Instruct groq/llama-4-scout-17b-16e-instruct
Test Llama-3.3-70B-Instruct Llama-4-Maverick-Instruct Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)
test_chat_non_streaming_basic (saturn)
test_chat_non_streaming_image
test_chat_non_streaming_structured_output (calendar)
test_chat_non_streaming_structured_output (math)
test_chat_non_streaming_tool_calling
test_chat_streaming_basic (earth)
test_chat_streaming_basic (saturn)
test_chat_streaming_image
test_chat_streaming_structured_output (calendar)
test_chat_streaming_structured_output (math)
test_chat_streaming_tool_calling

Openai-llama-stack

Tests run on: 2025-04-13 13:12:17

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai-llama-stack -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai-llama-stack -k "test_chat_non_streaming_basic and earth"

Model Key (Openai-llama-stack)

Display Name Full Model ID
gpt-4o openai/gpt-4o
gpt-4o-mini openai/gpt-4o-mini
Test gpt-4o gpt-4o-mini
test_chat_non_streaming_basic (earth)
test_chat_non_streaming_basic (saturn)
test_chat_non_streaming_image
test_chat_non_streaming_structured_output (calendar)
test_chat_non_streaming_structured_output (math)
test_chat_non_streaming_tool_calling
test_chat_streaming_basic (earth)
test_chat_streaming_basic (saturn)
test_chat_streaming_image
test_chat_streaming_structured_output (calendar)
test_chat_streaming_structured_output (math)
test_chat_streaming_tool_calling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment