Test Results Report

Generated on: 2025-04-13 13:12:49

This report was generated by running python tests/verifications/generate_report.py

Legend

✅ - Test passed
❌ - Test failed
⚪ - Test not applicable or not run for this model

Summary

Provider	Pass Rate	Tests Passed	Total Tests
Together	64.7%	22	34
Fireworks	82.4%	28	34
Groq	61.8%	21	34
Openai	100.0%	24	24
Together-llama-stack	100.0%	34	34
Fireworks-llama-stack	100.0%	34	34
Groq-llama-stack	88.2%	30	34
Openai-llama-stack	100.0%	24	24

Together

Tests run on: 2025-04-13 13:06:42

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -k "test_chat_non_streaming_basic and earth"

Model Key (Together)

Display Name	Full Model ID
Llama-3.3-70B-Instruct	`meta-llama/Llama-3.3-70B-Instruct-Turbo`
Llama-4-Maverick-Instruct	`meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8`
Llama-4-Scout-Instruct	`meta-llama/Llama-4-Scout-17B-16E-Instruct`

Test	Llama-3.3-70B-Instruct	Llama-4-Maverick-Instruct	Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)	✅	✅	✅
test_chat_non_streaming_basic (saturn)	✅	✅	✅
test_chat_non_streaming_image	⚪	✅	✅
test_chat_non_streaming_structured_output (calendar)	✅	✅	✅
test_chat_non_streaming_structured_output (math)	✅	✅	✅
test_chat_non_streaming_tool_calling	✅	✅	✅
test_chat_streaming_basic (earth)	✅	❌	❌
test_chat_streaming_basic (saturn)	✅	❌	❌
test_chat_streaming_image	⚪	❌	❌
test_chat_streaming_structured_output (calendar)	✅	❌	❌
test_chat_streaming_structured_output (math)	✅	❌	❌
test_chat_streaming_tool_calling	✅	❌	❌

Fireworks

Tests run on: 2025-04-13 13:07:34

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -k "test_chat_non_streaming_basic and earth"

Model Key (Fireworks)

Display Name	Full Model ID
Llama-3.3-70B-Instruct	`accounts/fireworks/models/llama-v3p3-70b-instruct`
Llama-4-Maverick-Instruct	`accounts/fireworks/models/llama4-maverick-instruct-basic`
Llama-4-Scout-Instruct	`accounts/fireworks/models/llama4-scout-instruct-basic`

Test	Llama-3.3-70B-Instruct	Llama-4-Maverick-Instruct	Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)	✅	✅	✅
test_chat_non_streaming_basic (saturn)	✅	✅	✅
test_chat_non_streaming_image	⚪	✅	✅
test_chat_non_streaming_structured_output (calendar)	✅	✅	✅
test_chat_non_streaming_structured_output (math)	✅	✅	✅
test_chat_non_streaming_tool_calling	❌	❌	❌
test_chat_streaming_basic (earth)	✅	✅	✅
test_chat_streaming_basic (saturn)	✅	✅	✅
test_chat_streaming_image	⚪	✅	✅
test_chat_streaming_structured_output (calendar)	✅	✅	✅
test_chat_streaming_structured_output (math)	✅	✅	✅
test_chat_streaming_tool_calling	❌	❌	❌

Groq

Tests run on: 2025-04-13 13:08:51

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=groq -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=groq -k "test_chat_non_streaming_basic and earth"

Model Key (Groq)

Display Name	Full Model ID
Llama-3.3-70B-Instruct	`llama-3.3-70b-versatile`
Llama-4-Maverick-Instruct	`meta-llama/llama-4-maverick-17b-128e-instruct`
Llama-4-Scout-Instruct	`meta-llama/llama-4-scout-17b-16e-instruct`

Test	Llama-3.3-70B-Instruct	Llama-4-Maverick-Instruct	Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)	✅	✅	✅
test_chat_non_streaming_basic (saturn)	✅	✅	✅
test_chat_non_streaming_image	⚪	✅	✅
test_chat_non_streaming_structured_output (calendar)	❌	❌	❌
test_chat_non_streaming_structured_output (math)	❌	❌	❌
test_chat_non_streaming_tool_calling	✅	✅	✅
test_chat_streaming_basic (earth)	✅	✅	✅
test_chat_streaming_basic (saturn)	✅	✅	✅
test_chat_streaming_image	⚪	✅	✅
test_chat_streaming_structured_output (calendar)	❌	❌	❌
test_chat_streaming_structured_output (math)	❌	❌	❌
test_chat_streaming_tool_calling	✅	❌	✅

Openai

Tests run on: 2025-04-13 13:09:17

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -k "test_chat_non_streaming_basic and earth"

Model Key (Openai)

Display Name	Full Model ID
gpt-4o	`gpt-4o`
gpt-4o-mini	`gpt-4o-mini`

Test	gpt-4o	gpt-4o-mini
test_chat_non_streaming_basic (earth)	✅	✅
test_chat_non_streaming_basic (saturn)	✅	✅
test_chat_non_streaming_image	✅	✅
test_chat_non_streaming_structured_output (calendar)	✅	✅
test_chat_non_streaming_structured_output (math)	✅	✅
test_chat_non_streaming_tool_calling	✅	✅
test_chat_streaming_basic (earth)	✅	✅
test_chat_streaming_basic (saturn)	✅	✅
test_chat_streaming_image	✅	✅
test_chat_streaming_structured_output (calendar)	✅	✅
test_chat_streaming_structured_output (math)	✅	✅
test_chat_streaming_tool_calling	✅	✅

Together-llama-stack

Tests run on: 2025-04-13 13:09:56

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together-llama-stack -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together-llama-stack -k "test_chat_non_streaming_basic and earth"

Model Key (Together-llama-stack)

Display Name	Full Model ID
Llama-3.3-70B-Instruct	`together/meta-llama/Llama-3.3-70B-Instruct-Turbo`
Llama-4-Maverick-Instruct	`together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8`
Llama-4-Scout-Instruct	`together/meta-llama/Llama-4-Scout-17B-16E-Instruct`

Test	Llama-3.3-70B-Instruct	Llama-4-Maverick-Instruct	Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)	✅	✅	✅
test_chat_non_streaming_basic (saturn)	✅	✅	✅
test_chat_non_streaming_image	⚪	✅	✅
test_chat_non_streaming_structured_output (calendar)	✅	✅	✅
test_chat_non_streaming_structured_output (math)	✅	✅	✅
test_chat_non_streaming_tool_calling	✅	✅	✅
test_chat_streaming_basic (earth)	✅	✅	✅
test_chat_streaming_basic (saturn)	✅	✅	✅
test_chat_streaming_image	⚪	✅	✅
test_chat_streaming_structured_output (calendar)	✅	✅	✅
test_chat_streaming_structured_output (math)	✅	✅	✅
test_chat_streaming_tool_calling	✅	✅	✅

Fireworks-llama-stack

Tests run on: 2025-04-13 13:10:46

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks-llama-stack -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks-llama-stack -k "test_chat_non_streaming_basic and earth"

Model Key (Fireworks-llama-stack)

Display Name	Full Model ID
Llama-3.3-70B-Instruct	`fireworks/llama-v3p3-70b-instruct`
Llama-4-Maverick-Instruct	`fireworks/llama4-maverick-instruct-basic`
Llama-4-Scout-Instruct	`fireworks/llama4-scout-instruct-basic`

Test	Llama-3.3-70B-Instruct	Llama-4-Maverick-Instruct	Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)	✅	✅	✅
test_chat_non_streaming_basic (saturn)	✅	✅	✅
test_chat_non_streaming_image	⚪	✅	✅
test_chat_non_streaming_structured_output (calendar)	✅	✅	✅
test_chat_non_streaming_structured_output (math)	✅	✅	✅
test_chat_non_streaming_tool_calling	✅	✅	✅
test_chat_streaming_basic (earth)	✅	✅	✅
test_chat_streaming_basic (saturn)	✅	✅	✅
test_chat_streaming_image	⚪	✅	✅
test_chat_streaming_structured_output (calendar)	✅	✅	✅
test_chat_streaming_structured_output (math)	✅	✅	✅
test_chat_streaming_tool_calling	✅	✅	✅

Groq-llama-stack

Tests run on: 2025-04-13 13:11:39

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=groq-llama-stack -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=groq-llama-stack -k "test_chat_non_streaming_basic and earth"

Model Key (Groq-llama-stack)

Display Name	Full Model ID
Llama-3.3-70B-Instruct	`groq/llama-3.3-70b-versatile`
Llama-4-Maverick-Instruct	`groq/llama-4-maverick-17b-128e-instruct`
Llama-4-Scout-Instruct	`groq/llama-4-scout-17b-16e-instruct`

Test	Llama-3.3-70B-Instruct	Llama-4-Maverick-Instruct	Llama-4-Scout-Instruct
test_chat_non_streaming_basic (earth)	✅	✅	✅
test_chat_non_streaming_basic (saturn)	✅	✅	✅
test_chat_non_streaming_image	⚪	✅	✅
test_chat_non_streaming_structured_output (calendar)	✅	✅	✅
test_chat_non_streaming_structured_output (math)	✅	✅	✅
test_chat_non_streaming_tool_calling	✅	❌	❌
test_chat_streaming_basic (earth)	✅	✅	✅
test_chat_streaming_basic (saturn)	✅	✅	✅
test_chat_streaming_image	⚪	✅	✅
test_chat_streaming_structured_output (calendar)	✅	✅	✅
test_chat_streaming_structured_output (math)	✅	✅	✅
test_chat_streaming_tool_calling	✅	❌	❌

Openai-llama-stack

Tests run on: 2025-04-13 13:12:17

# Run all tests for this provider:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai-llama-stack -v

# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai-llama-stack -k "test_chat_non_streaming_basic and earth"

Model Key (Openai-llama-stack)

Display Name	Full Model ID
gpt-4o	`openai/gpt-4o`
gpt-4o-mini	`openai/gpt-4o-mini`

Test	gpt-4o	gpt-4o-mini
test_chat_non_streaming_basic (earth)	✅	✅
test_chat_non_streaming_basic (saturn)	✅	✅
test_chat_non_streaming_image	✅	✅
test_chat_non_streaming_structured_output (calendar)	✅	✅
test_chat_non_streaming_structured_output (math)	✅	✅
test_chat_non_streaming_tool_calling	✅	✅
test_chat_streaming_basic (earth)	✅	✅
test_chat_streaming_basic (saturn)	✅	✅
test_chat_streaming_image	✅	✅
test_chat_streaming_structured_output (calendar)	✅	✅
test_chat_streaming_structured_output (math)	✅	✅
test_chat_streaming_tool_calling	✅	✅

bbrowning/REPORT.md