Skip to content

Instantly share code, notes, and snippets.

@grahama1970
Created July 18, 2025 20:18
Show Gist options
  • Select an option

  • Save grahama1970/7593d5cbb47ab4801a3a917fe2e4d7fe to your computer and use it in GitHub Desktop.

Select an option

Save grahama1970/7593d5cbb47ab4801a3a917fe2e4d7fe to your computer and use it in GitHub Desktop.
Gemini Cli Stress Test
# Gemini CLI Stress Test Harness

This Python tool benchmarks the response time and reliability of the Google Gemini CLI on complex or large prompts, supporting concurrent execution for load testing.

---

## Features

- **Concurrent execution:** Run multiple parallel Gemini CLI jobs.
- **Prompt file support:** Load any `.md` or `.txt` prompt from disk.
- **Smart progress bar:** See task progress in real time via `tqdm`.
- **Adaptive or static timeout:** Baseline a single run or set a static timeout.
- **Saves every output:** Output, error, and a machine-readable JSON report per test run.
- **Flexible input:** Send prompt as a CLI arg (`-p ...`) or via stdin (file/pipe), based on CLI command.
- **Extensive logging:** Uses `loguru` for rich log output.

---

## Usage

### Standard timeout

```
python gemini_stress_test.py --concurrent-runs 5 --timeout 120 --cmd "gemini -y -p" --prompt-file prompt.md
```

### Adaptive timeout (auto-computes best timeout)

```
python gemini_stress_test.py --concurrent-runs 5 --cmd "gemini -y -p" --prompt-file prompt.md
```

### With stdin piping (for long/complex prompts)

```
python gemini_stress_test.py --concurrent-runs 5 --cmd "gemini -y" --prompt-file prompt.md
```

> **Note:** To use stdin mode, ensure your script is configured to send the prompt file to Gemini’s stdin instead of via the `-p` argument (recommended for large or complex prompts).

---

## Output

All logs, `stdout`, `stderr`, and a single `report.json` are saved to a new timestamped folder for each run:

```
./test_run_YYYYMMDD_HHMMSS/
```

---

## Gemini CLI Input Recommendations

- **`-p ""`**: Use for small/medium prompts; passed as an argument.
- **stdin (`gemini -y < prompt.md`)**: Use for large/complex prompts or automated chaining.
    - Only one mode is supported at a time ([Gemini CLI docs](https://cloud.google.com/gemini/docs/codeassist/gemini-cli); [issue #4405](https://github.com/google-gemini/gemini-cli/issues/4405)).
    - Don’t use both `-p` and stdin; if both are supplied, only `-p` is used, and stdin is ignored.

---

## System Requirements

- Python 3.8+
- [`loguru`](https://pypi.org/project/loguru/)
- [`tqdm`](https://pypi.org/project/tqdm/)
- [`typer`](https://pypi.org/project/typer/)
- [Gemini CLI](https://cloud.google.com/gemini/docs/codeassist/gemini-cli) installed and authenticated

---

## Example Prompt File (`prompt.md`)

```
You are a skilled AI project architect.

Your task is to simulate an entire year-long global product launch plan.

In this simulation, there are:
- 6 regional offices (North America, Europe, Asia, South America, Africa, Australia)
- 20 cross-functional stakeholders (execs, engineers, designers, sales, support)
- 50 unique product features launching across multiple timelines

Instructions:
1. Break down the plan into 4 quarters (Q1 to Q4).
2. For each quarter, define 3–5 major milestones.
3. For each milestone, define:
   - A short description
   - Start and end dates
   - Assigned leads
   - Dependencies and region(s) affected
   - Estimated effort in person-weeks

4. After establishing the quarterly plans:
   - Generate a matrix of responsibility showing which stakeholders are involved in which milestones.
   - Include a JSON summary for each regional office with aggregated effort, milestones owned, and feature coverage.

5. Return the ENTIRE OUTPUT as formatted JSON using the schema below (truncate only if absolutely necessary).

JSON Schema:
{
  "quarters": [
    {
      "quarter": "Q1",
      "milestones": [
        {
          "title": "Launch Alpha",
          "description": "Initial Alpha release with testing cohort",
          "start_date": "2025-01-15",
          "end_date": "2025-02-28",
          "assigned_leads": ["Alice", "David"],
          "regions": ["North America"],
          "effort_person_weeks": 120
        }
      ]
    }
  ],
  "responsibility_matrix": { "Alice": ["Q1: Launch Alpha", "Q2: Beta Sprint"] },
  "regional_summaries": {
    "North America": {
      "milestones": 8,
      "total_effort": 460,
      "features_covered": ["Feature A", "Feature D", "..."]
    }
  }
}

When responding, begin generating data as quickly as possible.
If the whole output is too large for the context window, begin with Q1 and stream the remainder.
Do not include any explanatory text — only return the JSON object.
```

---

## References

- [Gemini CLI Documentation](https://cloud.google.com/gemini/docs/codeassist/gemini-cli)
- [How to pipe prompts via stdin](https://github.com/google-gemini/gemini-cli/issues/4405)
- [Official blog: Google announces Gemini CLI](https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/)
- [DataCamp: Gemini CLI Guide](https://www.datacamp.com/tutorial/gemini-cli)
- [Practical Guide on Dev.to](https://dev.to/shahidkhans/a-practical-guide-to-gemini-cli-941)
- [YouTube: Gemini CLI for Testing & Automation](https://www.youtube.com/watch?v=hsAYuKHVQhk)

---

**Current date:** Friday, July 18, 2025, 4:16 PM EDT

```

[1] https://www.datacamp.com/tutorial/gemini-cli
[2] https://www.youtube.com/watch?v=fzA9OZy0TY0
[3] https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/
[4] https://cloud.google.com/gemini/docs/codeassist/gemini-cli
[5] https://dev.to/shahidkhans/a-practical-guide-to-gemini-cli-941
[6] https://www.youtube.com/watch?v=hsAYuKHVQhk
[7] https://www.youtube.com/watch?v=MHmtBM1kFrg
[8] https://thecreatorsai.com/p/complete-guide-to-gemini-cli
[9] https://www.reddit.com/r/GeminiAI/comments/1lkojt8/gemini_cli_a_comprehensive_guide_to_understanding/
[10] https://blog.logrocket.com/gemini-cli-tutorial/
"""
### How to Use with Gemini CLI
```zsh
gemini -y < prompt.md
```
### Or with your Python benchmarking script:
```zsh
python gemini_stress_test.py --cmd "gemini -y" --prompt-file prompt.md```
### Stress Testing with Concurrency
```zsh
python gemini_stress_test.py --concurrent-runs 5 --timeout 120 --cmd "gemini -y" --prompt-file prompt.md
```
"""
import os
import math
import json
import asyncio
from datetime import datetime, timezone
from typing import List, Dict, Optional
import typer
from loguru import logger
from tqdm import tqdm
app = typer.Typer(help="Stress test Gemini CLI with concurrent prompt executions.")
# --- Async Helpers ---
async def save_stream_to_file(stream: asyncio.StreamReader, filepath: str):
with open(filepath, "wb") as f:
while True:
line = await stream.readline()
if not line:
break
f.write(line)
async def run_single_test(
test_id: int,
prompt: str,
command: List[str],
timeout: int,
embed_stdout: bool,
output_dir: str
) -> Dict[str, Any]:
start_time = datetime.now(timezone.utc)
log_prefix = f"[Test-{test_id:02d}]"
timestamp = start_time.strftime("%Y%m%d_%H%M%S")
run_id_str = f"run{test_id:02d}"
stdout_file = os.path.join(output_dir, f"output_{timestamp}_{run_id_str}.json")
stderr_file = os.path.join(output_dir, f"error_{timestamp}_{run_id_str}.log")
result: Dict[str, Any] = {
"test_id": test_id,
"status": "UNKNOWN",
"start_time_utc": start_time.isoformat(),
"duration_seconds": -1.0,
"exit_code": None,
"stdout_path": stdout_file,
"stderr_content": "",
}
full_command = command + [prompt]
try:
proc = await asyncio.create_subprocess_exec(
*full_command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
await asyncio.wait_for(
asyncio.gather(
save_stream_to_file(proc.stdout, stdout_file),
save_stream_to_file(proc.stderr, stderr_file),
proc.wait()
),
timeout=timeout
)
result["exit_code"] = proc.returncode
if proc.returncode == 0:
result["status"] = "SUCCESS"
logger.success(f"{log_prefix} completed.")
else:
result["status"] = "FAILURE"
logger.warning(f"{log_prefix} exited with code {proc.returncode}")
except asyncio.TimeoutError:
result["status"] = "TIMEOUT"
logger.error(f"{log_prefix} timed out after {timeout} seconds.")
proc.kill()
await proc.wait()
except FileNotFoundError:
result["status"] = "FAILURE"
result["stderr_content"] = f"ERROR: Command '{command[0]}' not found."
logger.error(f"{log_prefix} {result['stderr_content']}")
except Exception as e:
result["status"] = "FAILURE"
result["stderr_content"] = f"Exception occurred: {e}"
logger.error(f"{log_prefix} {result['stderr_content']}")
end_time = datetime.now(timezone.utc)
result["duration_seconds"] = (end_time - start_time).total_seconds()
try:
with open(stderr_file, "r", encoding="utf-8") as f:
result["stderr_content"] += f.read()
except FileNotFoundError:
pass
if embed_stdout and result["status"] == "SUCCESS":
try:
with open(stdout_file, "r", encoding="utf-8") as f:
result["stdout_content"] = f.read()
except Exception as e:
result["stdout_content"] = f"Error reading stdout: {e}"
return result
# --- Main CLI ---
@app.command()
def run_stress_test(
prompt_file: str = typer.Option("prompt.md", help="Path to prompt .md/.txt file."),
cmd: str = typer.Option("gemini -y -p", "--cmd", help="Base command to run, with prompt appended"),
concurrent_runs: int = typer.Option(1, "--concurrent-runs", "-c", help="Number of concurrent test executions."),
timeout: Optional[int] = typer.Option(None, "--timeout", "-t", help="Fixed timeout in seconds."),
adaptive_timeout_buffer: float = typer.Option(1.5, help="Timeout multiplier if using adaptive mode."),
embed_stdout: bool = typer.Option(False, help="Include stdout contents in the JSON report.")
):
"""
Run Gemini CLI concurrently for performance/load testing.
"""
import sys
script_dir = os.path.dirname(os.path.abspath(sys.argv[0]))
prompt_path = os.path.abspath(prompt_file)
if not os.path.exists(prompt_path):
logger.error(f"Prompt file not found: {prompt_path}")
raise typer.Exit(code=1)
with open(prompt_path, "r", encoding="utf-8") as f:
prompt = f.read().strip()
output_dir = os.path.join(script_dir, f"test_run_{datetime.now().strftime('%Y%m%d_%H%M%S')}")
os.makedirs(output_dir, exist_ok=True)
command_parts = cmd.strip().split()
timeout_to_use = timeout
async def orchestrate():
nonlocal timeout_to_use
if timeout is None:
logger.info("Running baseline to determine adaptive timeout...")
baseline = await run_single_test(0, prompt, command_parts, 600, False, output_dir)
if baseline["status"] != "SUCCESS":
logger.error("Baseline run failed. Cannot determine timeout.")
return
baseline_secs = baseline["duration_seconds"]
timeout_to_use = int(math.ceil(baseline_secs * adaptive_timeout_buffer))
logger.info(f"Baseline: {baseline_secs:.2f}s → timeout = {timeout_to_use}s")
tasks = [
run_single_test(i + 1, prompt, command_parts, timeout_to_use, embed_stdout, output_dir)
for i in range(concurrent_runs)
]
results = []
with tqdm(total=concurrent_runs, desc="Running Tests") as pbar:
for coro in asyncio.as_completed(tasks):
result = await coro
results.append(result)
pbar.update(1)
# Output final report
summary = {
"started": datetime.now(timezone.utc).isoformat(),
"concurrent_runs": concurrent_runs,
"timeout_seconds": timeout_to_use,
"command": cmd,
"prompt_file": prompt_file,
"success": sum(1 for r in results if r["status"] == "SUCCESS"),
"failures": sum(1 for r in results if r["status"] == "FAILURE"),
"timeouts": sum(1 for r in results if r["status"] == "TIMEOUT"),
}
report = {
"summary": summary,
"results": sorted(results, key=lambda r: r["test_id"])
}
report_path = os.path.join(output_dir, "report.json")
with open(report_path, "w", encoding="utf-8") as f:
json.dump(report, f, indent=2)
logger.info(f"✅ Done. Results saved to: {report_path}")
logger.info(f"✅ Success: {summary['success']} | ❌ Failures: {summary['failures']} | ⏱️ Timeouts: {summary['timeouts']}")
asyncio.run(orchestrate())
if __name__ == "__main__":
app()

Gemini Stress Test Prompt

You are a skilled AI project architect.

Your task is to simulate an entire year-long global product launch plan.

In this simulation, there are:

  • 6 regional offices:

    • North America
    • Europe
    • Asia
    • South America
    • Africa
    • Australia
  • 20 cross-functional stakeholders:

    • Executives
    • Engineers
    • Designers
    • Sales
    • Support
  • 50 unique product features launching across multiple timelines


Instructions

  1. Break down the plan into 4 quarters (Q1 to Q4).

  2. For each quarter, define 3–5 major milestones.

  3. For each milestone, define the following:

    • A short description
    • Start and end dates
    • Assigned leads
    • Dependencies and region(s) affected
    • Estimated effort in person-weeks
  4. After establishing the quarterly plans:

    • Generate a matrix of responsibility showing which stakeholders are involved in which milestones.
    • Include a JSON summary for each regional office showing:
      • Total milestones owned
      • Total effort (in person-weeks)
      • Features covered
  5. Return only the output as a complete JSON object.

If output is too large, begin with Q1 and stream the remaining quarters.


🎯 JSON Schema

{
  "quarters": [
    {
      "quarter": "Q1",
      "milestones": [
        {
          "title": "Launch Alpha",
          "description": "Initial Alpha release with testing cohort",
          "start_date": "2025-01-15",
          "end_date": "2025-02-28",
          "assigned_leads": ["Alice", "David"],
          "regions": ["North America"],
          "effort_person_weeks": 120
        }
      ]
    }
  ],
  "responsibility_matrix": {
    "Alice": ["Q1: Launch Alpha", "Q2: Beta Sprint"]
  },
  "regional_summaries": {
    "North America": {
      "milestones": 8,
      "total_effort": 460,
      "features_covered": ["Feature A", "Feature D", "..."]
    }
  }
}

Output Requirements

  • ⏱ Begin generating response immediately.
  • 📦 If output exceeds context limit, return Q1 first and stream/continue.
  • DO NOT include explanation or commentary. Return JSON only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment