Skip to content

Instantly share code, notes, and snippets.

@MaxGhenis
Last active August 23, 2025 11:55
Show Gist options
  • Save MaxGhenis/0627c65ae28ca0cf2674ddfbd817c85b to your computer and use it in GitHub Desktop.
Save MaxGhenis/0627c65ae28ca0cf2674ddfbd817c85b to your computer and use it in GitHub Desktop.
Expected Parrot EDSL: n parameter not implemented (accepts but ignores)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# EDSL n Parameter Issue: Comprehensive Testing\n",
"\n",
"This notebook demonstrates:\n",
"1. EDSL accepts but doesn't implement the `n` parameter\n",
"2. Direct API tests showing OpenAI (n≤128) and Gemini (candidateCount≤8) support\n",
"3. Cost implications: 99% savings on input tokens possible\n",
"\n",
"**Update**: Added comprehensive testing of OpenAI and Gemini limits with exact boundaries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Setup\n",
"import os\n",
"import time\n",
"from edsl import Model, QuestionFreeText\n",
"from edsl.agents import Agent, AgentList"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 1: EDSL's n Parameter Doesn't Work"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test EDSL with n=5\n",
"question = QuestionFreeText(\n",
" question_name=\"number\",\n",
" question_text=\"Pick a random number between 1 and 100. Reply with just the number:\"\n",
")\n",
"\n",
"model_with_n = Model('gpt-4o-mini', service_name='openai', n=5)\n",
"\n",
"print(\"Testing EDSL with n=5 parameter...\")\n",
"result = question.by(model_with_n).run()\n",
"\n",
"results_list = result.to_list()\n",
"print(f\"\\nResults received: {len(results_list)}\")\n",
"print(f\"Expected: 5\")\n",
"print(f\"Status: {'✅ PASS' if len(results_list) == 5 else '❌ FAIL - n parameter not implemented'}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 2: OpenAI API - n Parameter Works (Limit: 128)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from openai import OpenAI\n",
"\n",
"client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))\n",
"\n",
"print(\"Testing OpenAI n parameter directly:\")\n",
"print(\"=\" * 50)\n",
"\n",
"# Test boundary values\n",
"test_values = [1, 10, 128, 129]\n",
"\n",
"for n_val in test_values:\n",
" print(f\"\\nn={n_val}:\")\n",
" try:\n",
" response = client.chat.completions.create(\n",
" model=\"gpt-4o-mini\",\n",
" messages=[{\"role\": \"user\", \"content\": \"random number 1-100 nothing else\"}],\n",
" n=n_val,\n",
" temperature=1.0,\n",
" max_tokens=5\n",
" )\n",
" \n",
" print(f\" ✅ SUCCESS: Got {len(response.choices)} completions\")\n",
" \n",
" # Show examples for small n\n",
" if n_val <= 10:\n",
" numbers = [c.message.content.strip() for c in response.choices]\n",
" print(f\" Numbers: {numbers}\")\n",
" \n",
" # Show token usage\n",
" print(f\" Input tokens: {response.usage.prompt_tokens} (charged once!)\")\n",
" print(f\" Savings: {(1 - 1/n_val)*100:.1f}% on input tokens\")\n",
" \n",
" except Exception as e:\n",
" if \"maximum\" in str(e).lower() or \"Expected a value <= 128\" in str(e):\n",
" print(f\" ❌ LIMIT EXCEEDED: n must be ≤ 128\")\n",
" else:\n",
" print(f\" ❌ Error: {str(e)[:100]}\")\n",
"\n",
"print(\"\\n\" + \"=\" * 50)\n",
"print(\"OpenAI n parameter limit: EXACTLY 128\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 3: Google Gemini - candidateCount Works (Limit: 8)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import google.generativeai as genai\n",
"\n",
"# Configure Gemini\n",
"genai.configure(api_key=os.environ.get('GOOGLE_API_KEY'))\n",
"\n",
"print(\"Testing Gemini candidateCount parameter:\")\n",
"print(\"=\" * 50)\n",
"\n",
"# Test with Gemini 2.5 Flash\n",
"model = genai.GenerativeModel('gemini-2.5-flash')\n",
"\n",
"# Test boundary values\n",
"test_values = [1, 4, 8, 9]\n",
"\n",
"for count in test_values:\n",
" print(f\"\\ncandidateCount={count}:\")\n",
" try:\n",
" config = genai.GenerationConfig(\n",
" candidate_count=count,\n",
" temperature=1.0,\n",
" max_output_tokens=5\n",
" )\n",
" \n",
" response = model.generate_content(\n",
" \"random number 1-100 nothing else\",\n",
" generation_config=config\n",
" )\n",
" \n",
" print(f\" ✅ SUCCESS: Got {len(response.candidates)} candidates\")\n",
" \n",
" # Show numbers\n",
" numbers = []\n",
" for c in response.candidates:\n",
" if c.content and c.content.parts:\n",
" numbers.append(c.content.parts[0].text.strip())\n",
" if numbers:\n",
" print(f\" Numbers: {numbers}\")\n",
" \n",
" # Token usage\n",
" if hasattr(response, 'usage_metadata'):\n",
" print(f\" Input tokens: {response.usage_metadata.prompt_token_count} (charged once!)\")\n",
" print(f\" Savings: {(1 - 1/count)*100:.1f}% on input tokens\")\n",
" \n",
" except Exception as e:\n",
" if \"[1, 8]\" in str(e) or \"must be in range\" in str(e):\n",
" print(f\" ❌ LIMIT EXCEEDED: candidateCount must be in [1, 8]\")\n",
" else:\n",
" print(f\" ❌ Error: {str(e)[:100]}\")\n",
"\n",
"print(\"\\n\" + \"=\" * 50)\n",
"print(\"Gemini candidateCount limit: EXACTLY 8\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 4: Provider Comparison Table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"# Create comparison table\n",
"providers_data = [\n",
" {\"Provider\": \"OpenAI\", \"Parameter\": \"n\", \"Max Limit\": 128, \"Input Tokens\": \"Charged once\", \"Status\": \"✅ Tested\"},\n",
" {\"Provider\": \"Azure OpenAI\", \"Parameter\": \"n\", \"Max Limit\": 128, \"Input Tokens\": \"Charged once\", \"Status\": \"Same as OpenAI\"},\n",
" {\"Provider\": \"Google Gemini\", \"Parameter\": \"candidateCount\", \"Max Limit\": 8, \"Input Tokens\": \"Charged once\", \"Status\": \"✅ Tested\"},\n",
" {\"Provider\": \"Anthropic\", \"Parameter\": \"None\", \"Max Limit\": 0, \"Input Tokens\": \"N/A\", \"Status\": \"❌ No support\"},\n",
" {\"Provider\": \"Together AI\", \"Parameter\": \"None\", \"Max Limit\": 0, \"Input Tokens\": \"N/A\", \"Status\": \"❌ No support\"},\n",
" {\"Provider\": \"Groq\", \"Parameter\": \"n (limited)\", \"Max Limit\": 1, \"Input Tokens\": \"N/A\", \"Status\": \"❌ n=1 only\"},\n",
" {\"Provider\": \"Mistral\", \"Parameter\": \"None\", \"Max Limit\": 0, \"Input Tokens\": \"N/A\", \"Status\": \"❌ No support\"},\n",
"]\n",
"\n",
"df = pd.DataFrame(providers_data)\n",
"print(\"Provider Support for Multiple Completions:\")\n",
"print(\"=\" * 70)\n",
"print(df.to_string(index=False))\n",
"\n",
"print(\"\\n\" + \"=\" * 70)\n",
"print(\"Key Findings:\")\n",
"print(\"• OpenAI: Supports up to n=128 (16x more than Gemini)\")\n",
"print(\"• Gemini: Supports up to candidateCount=8\")\n",
"print(\"• Others: No support - require multiple API calls\")\n",
"print(\"• Cost Savings: Up to 99% on input tokens with n=100\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 5: Cost Analysis - Real Impact"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Research scenario\n",
"num_prompts = 100\n",
"samples_per_prompt = 100\n",
"tokens_per_prompt = 150\n",
"tokens_per_completion = 50\n",
"\n",
"# Pricing (GPT-4o-mini)\n",
"price_per_1k_input = 0.00015\n",
"price_per_1k_output = 0.00060\n",
"\n",
"print(\"Cost Analysis: 100 prompts × 100 samples each\")\n",
"print(\"=\" * 60)\n",
"\n",
"# Current EDSL approach\n",
"total_calls_current = num_prompts * samples_per_prompt\n",
"input_tokens_current = total_calls_current * tokens_per_prompt\n",
"output_tokens = total_calls_current * tokens_per_completion\n",
"cost_current = (input_tokens_current * price_per_1k_input + \n",
" output_tokens * price_per_1k_output) / 1000\n",
"\n",
"print(\"\\n📊 Current EDSL (no n parameter):\")\n",
"print(f\" API calls: {total_calls_current:,}\")\n",
"print(f\" Input tokens: {input_tokens_current:,}\")\n",
"print(f\" Total cost: ${cost_current:.2f}\")\n",
"\n",
"# With OpenAI n parameter\n",
"total_calls_openai = num_prompts # Just 1 call per prompt!\n",
"input_tokens_openai = total_calls_openai * tokens_per_prompt\n",
"cost_openai = (input_tokens_openai * price_per_1k_input + \n",
" output_tokens * price_per_1k_output) / 1000\n",
"\n",
"print(\"\\n✨ With OpenAI n=100:\")\n",
"print(f\" API calls: {total_calls_openai:,}\")\n",
"print(f\" Input tokens: {input_tokens_openai:,} (99% reduction!)\")\n",
"print(f\" Total cost: ${cost_openai:.2f}\")\n",
"\n",
"# With Gemini candidateCount\n",
"total_calls_gemini = num_prompts * (samples_per_prompt // 8 + 1)\n",
"input_tokens_gemini = total_calls_gemini * tokens_per_prompt\n",
"cost_gemini = (input_tokens_gemini * price_per_1k_input + \n",
" output_tokens * price_per_1k_output) / 1000\n",
"\n",
"print(\"\\n🔷 With Gemini candidateCount=8:\")\n",
"print(f\" API calls: {total_calls_gemini:,} (batched in 8s)\")\n",
"print(f\" Input tokens: {input_tokens_gemini:,} (87.5% reduction)\")\n",
"print(f\" Total cost: ${cost_gemini:.2f}\")\n",
"\n",
"print(\"\\n\" + \"=\" * 60)\n",
"print(\"💰 SAVINGS:\")\n",
"print(f\" OpenAI vs Current: ${cost_current - cost_openai:.2f} saved ({(1-cost_openai/cost_current)*100:.0f}% cheaper)\")\n",
"print(f\" Gemini vs Current: ${cost_current - cost_gemini:.2f} saved ({(1-cost_gemini/cost_current)*100:.0f}% cheaper)\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"### Confirmed Limits (Empirically Tested)\n",
"- **OpenAI**: n ≤ 128 (n=129 fails with \"Expected a value <= 128\")\n",
"- **Gemini**: candidateCount ≤ 8 (9 fails with \"must be in range [1, 8]\")\n",
"\n",
"### Cost Impact\n",
"- For 100 samples: **99% savings** with OpenAI, **87.5% savings** with Gemini\n",
"- Research with 10,000 total completions: Save **$22.50** with proper implementation\n",
"\n",
"### EDSL Issues Filed\n",
"- Issue #2185: n parameter not implemented\n",
"- Issue #2184: No parameter validation (accepts invalid params)\n",
"- PR #2186: Added parameter validation system\n",
"\n",
"### Next Steps\n",
"Implement proper n parameter support in EDSL's `run()` method to leverage these native API features."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@johnjosephhorton
Copy link

So "our" n-parameter is passed to run not to model. This conceptually makes more sense to me, as our Model object is about the model not really how it is used. My instinct is for us to intercept that "run(n = ...)" in the case of models that support this approach and then change the API call. Do you have a sense, @MaxGhenis how many other inference providers are supporting this?

@MaxGhenis
Copy link
Author

MaxGhenis commented Aug 23, 2025

Makes sense! Just on OpenAI and Gemini:

Provider Parameter Max Limit API Documentation
OpenAI n 128 API Reference: n: integer, Optional, Defaults to 1. How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices.
Azure OpenAI n 128 Same as OpenAI - Azure Reference
Google Gemini candidateCount 8 API Reference: candidateCount: integer, Optional. The number of response variations to return. For each request, you're charged for the output tokens of all candidates, but are only charged once for the input tokens. Range: 1-8 for Gemini 2.0+
Anthropic - None API Reference: No n parameter. The Messages API only returns a single response per request.
Together AI - None API Reference: No n parameter documented in their completions endpoint.
Groq n 1 only API Reference: n: integer or null, Optional, default: 1. Note that at the current moment, only n=1 is supported.
Mistral - None API Reference: No n parameter in chat completions.
Others - None Bedrock (ref), Replicate (ref), Deep Infra - no n equivalent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment