A complete setup guide for integrating Google's Gemini CLI with Claude Code through an MCP (Model Context Protocol) server. This provides automatic second opinion consultation when Claude expresses uncertainty or encounters complex technical decisions.
See the template repository for a complete example, including Gemini CLI automated PR reviews : Example PR , Script.
# Switch to Node.js 22.16.0
nvm use 22.16.0
# Install Gemini CLI globally
npm install -g @google/gemini-cli
# Test installation
gemini --help
# Authenticate with Google account (free tier: 60 req/min, 1,000/day)
# Authentication happens automatically on first use
# Direct consultation (no container setup needed)
echo "Your question here" | gemini
# Example: Technical questions
echo "Best practices for microservice authentication?" | gemini -m gemini-2.5-pro
- Host-Based Setup: Both MCP server and Gemini CLI run on host machine
- Why Host-Only: Gemini CLI requires interactive authentication and avoids Docker-in-Docker complexity
- Communication Modes:
- stdio (recommended): Bidirectional streaming for production use
- HTTP: Simple request/response for testing
- Auto-consultation: Detects uncertainty patterns in Claude responses
- Manual consultation: On-demand second opinions via MCP tools
- Response synthesis: Combines both AI perspectives
- Singleton Pattern: Ensures consistent state management across all tool calls
βββ gemini_mcp_server.py # stdio-based MCP server with HTTP mode support
βββ gemini_mcp_server_http.py # HTTP server implementation (imported by main)
βββ gemini_integration.py # Core integration module with singleton pattern
βββ gemini-config.json # Gemini configuration
βββ start-gemini-mcp.sh # Startup script for both modes
βββ test_gemini_mcp.py # Test script for both server modes
All files should be placed in the same directory for easy deployment.
# Start MCP server in stdio mode (default)
cd your-project
python3 gemini_mcp_server.py --project-root .
# Or with environment variables
GEMINI_ENABLED=true \
GEMINI_AUTO_CONSULT=true \
GEMINI_CLI_COMMAND=gemini \
GEMINI_TIMEOUT=200 \
GEMINI_RATE_LIMIT=2 \
python3 gemini_mcp_server.py --project-root .
# Start MCP server in HTTP mode
python3 gemini_mcp_server.py --project-root . --port 8006
# The main server automatically:
# 1. Detects the --port argument
# 2. Imports gemini_mcp_server_http module
# 3. Starts the FastAPI server on the specified port
Add to your Claude Code's MCP settings:
{
"mcpServers": {
"gemini": {
"command": "python3",
"args": ["/path/to/gemini_mcp_server.py", "--project-root", "."],
"cwd": "/path/to/your/project",
"env": {
"GEMINI_ENABLED": "true",
"GEMINI_AUTO_CONSULT": "true",
"GEMINI_CLI_COMMAND": "gemini"
}
}
}
}
{
"mcpServers": {
"gemini-http": {
"url": "http://localhost:8006",
"transport": "http"
}
}
}
Feature | stdio Mode | HTTP Mode |
---|---|---|
Communication | Bidirectional streaming | Request/Response |
Performance | Better for long operations | Good for simple queries |
Real-time updates | β Supported | β Not supported |
Setup complexity | Moderate | Simple |
Use case | Production | Testing/Development |
Both server modes automatically detect if running inside a container and exit immediately with helpful instructions. This is critical because:
- Gemini CLI requires Docker access for containerized execution
- Running Docker-in-Docker causes authentication and performance issues
- The server must run on the host system to access the Docker daemon
- Detection happens before any imports to fail fast with clear error messages
Automatically detects patterns like:
- "I'm not sure", "I think", "possibly", "probably"
- "Multiple approaches", "trade-offs", "alternatives"
- Critical operations: "security", "production", "database migration"
Manual consultation with Gemini for second opinions or validation.
Parameters:
query
(required): The question or topic to consult Gemini aboutcontext
(optional): Additional context for the consultationcomparison_mode
(optional, default: true): Whether to request structured comparison formatforce
(optional, default: false): Force consultation even if disabled
Example:
# In Claude Code
Use the consult_gemini tool with:
query: "Should I use WebSockets or gRPC for real-time communication?"
context: "Building a multiplayer application with real-time updates"
comparison_mode: true
Check Gemini integration status and statistics.
Returns:
- Configuration status (enabled, auto-consult, CLI command, timeout, rate limit)
- Gemini CLI availability and version
- Consultation statistics (total, completed, average time)
- Conversation history size
Example:
# Check current status
Use the gemini_status tool
Enable or disable automatic Gemini consultation on uncertainty detection.
Parameters:
enable
(optional): true to enable, false to disable. If not provided, toggles current state.
Example:
# Toggle auto-consultation
Use the toggle_gemini_auto_consult tool
# Or explicitly enable/disable
Use the toggle_gemini_auto_consult tool with:
enable: false
Clear Gemini conversation history to start fresh.
Example:
# Clear all consultation history
Use the clear_gemini_history tool
- Identifies agreement/disagreement between Claude and Gemini
- Provides confidence levels (high/medium/low)
- Generates combined recommendations
- Tracks execution time and consultation ID
The integration maintains conversation history across consultations:
- Configurable history size (default: 10 entries)
- History included in subsequent consultations for context
- Can be cleared with
clear_gemini_history
tool
The MCP server exposes methods for detecting uncertainty:
# Detect uncertainty in responses
has_uncertainty, patterns = server.detect_response_uncertainty(response_text)
# Automatically consult if uncertain
result = await server.maybe_consult_gemini(response_text, context)
- Total consultations attempted
- Successful completions
- Average execution time per consultation
- Total execution time across all consultations
- Conversation history size
- Last consultation timestamp
- Error tracking and timeout monitoring
GEMINI_ENABLED=true # Enable integration
GEMINI_AUTO_CONSULT=true # Auto-consult on uncertainty
GEMINI_CLI_COMMAND=gemini # CLI command to use
GEMINI_TIMEOUT=200 # Query timeout in seconds
GEMINI_RATE_LIMIT=5 # Delay between calls (seconds)
GEMINI_MAX_CONTEXT=4000 # Max context length
GEMINI_MODEL=gemini-2.5-flash # Model to use
GEMINI_SANDBOX=false # Sandboxing isolates operations
GEMINI_API_KEY= # Optional (blank for free tier)
GEMINI_LOG_CONSULTATIONS=true # Log consultation details
GEMINI_DEBUG=false # Debug mode
GEMINI_INCLUDE_HISTORY=true # Include conversation history
GEMINI_MAX_HISTORY=10 # Max history entries to maintain
GEMINI_MCP_PORT=8006 # Port for HTTP mode (if used)
GEMINI_MCP_HOST=127.0.0.1 # Host for HTTP mode (if used)
Create gemini-config.json
:
{
"enabled": true,
"auto_consult": true,
"cli_command": "gemini",
"timeout": 300,
"rate_limit_delay": 5.0,
"log_consultations": true,
"model": "gemini-2.5-flash",
"sandbox_mode": true,
"debug_mode": false,
"include_history": true,
"max_history_entries": 10,
"uncertainty_thresholds": {
"uncertainty_patterns": true,
"complex_decisions": true,
"critical_operations": true
}
}
UNCERTAINTY_PATTERNS = [
r"\bI'm not sure\b",
r"\bI think\b",
r"\bpossibly\b",
r"\bprobably\b",
r"\bmight be\b",
r"\bcould be\b",
# ... more patterns
]
COMPLEX_DECISION_PATTERNS = [
r"\bmultiple approaches\b",
r"\bseveral options\b",
r"\btrade-offs?\b",
r"\balternatives?\b",
# ... more patterns
]
CRITICAL_OPERATION_PATTERNS = [
r"\bproduction\b",
r"\bdatabase migration\b",
r"\bsecurity\b",
r"\bauthentication\b",
# ... more patterns
]
class GeminiIntegration:
def __init__(self, config: Optional[Dict[str, Any]] = None):
self.config = config or {}
self.enabled = self.config.get('enabled', True)
self.auto_consult = self.config.get('auto_consult', True)
self.cli_command = self.config.get('cli_command', 'gemini')
self.timeout = self.config.get('timeout', 60)
self.rate_limit_delay = self.config.get('rate_limit_delay', 2)
self.conversation_history = []
self.max_history_entries = self.config.get('max_history_entries', 10)
async def consult_gemini(self, query: str, context: str = "") -> Dict[str, Any]:
"""Consult Gemini CLI for second opinion"""
# Rate limiting
await self._enforce_rate_limit()
# Prepare query with context and history
full_query = self._prepare_query(query, context)
# Execute Gemini CLI command
result = await self._execute_gemini_cli(full_query)
# Update conversation history
if self.include_history and result.get("output"):
self.conversation_history.append((query, result["output"]))
# Trim history if needed
if len(self.conversation_history) > self.max_history_entries:
self.conversation_history = self.conversation_history[-self.max_history_entries:]
return result
def detect_uncertainty(self, text: str) -> Tuple[bool, List[str]]:
"""Detect if text contains uncertainty patterns"""
found_patterns = []
# Check all pattern categories
# Returns (has_uncertainty, list_of_matched_patterns)
# Singleton pattern implementation
_integration = None
def get_integration(config: Optional[Dict[str, Any]] = None) -> GeminiIntegration:
"""Get or create the global Gemini integration instance"""
global _integration
if _integration is None:
_integration = GeminiIntegration(config)
return _integration
The singleton pattern ensures:
- Consistent Rate Limiting: All MCP tool calls share the same rate limiter
- Unified Configuration: Changes to config affect all usage points
- State Persistence: Consultation history and statistics are maintained
- Resource Efficiency: Only one instance manages the Gemini CLI connection
# In Claude Code
Use the consult_gemini tool with:
query: "Should I use WebSockets or gRPC for real-time communication?"
context: "Building a multiplayer application with real-time updates"
User: "How should I handle authentication?"
Claude: "I think OAuth might work, but I'm not certain about the security implications..."
[Auto-consultation triggered]
Gemini: "For authentication, consider these approaches: 1) OAuth 2.0 with PKCE for web apps..."
Synthesis: Both suggest OAuth but Claude uncertain about security. Gemini provides specific implementation details. Recommendation: Follow Gemini's OAuth 2.0 with PKCE approach.
# Test stdio mode (default)
python3 test_gemini_mcp.py
# Test HTTP mode
python3 test_gemini_mcp.py --mode http
# Test specific server
python3 test_gemini_mcp.py --mode stdio --verbose
# Start HTTP server
python3 gemini_mcp_server.py --port 8006
# Test endpoints
curl http://localhost:8006/health
curl http://localhost:8006/mcp/tools
# Test Gemini consultation
curl -X POST http://localhost:8006/mcp/tools/consult_gemini \
-H "Content-Type: application/json" \
-d '{"query": "What is the best Python web framework?"}'
Issue | Solution |
---|---|
Gemini CLI not found | Install Node.js 18+ and npm install -g @google/gemini-cli |
Authentication errors | Run gemini and sign in with Google account |
Node version issues | Use nvm use 22.16.0 |
Timeout errors | Increase GEMINI_TIMEOUT (default: 60s) |
Auto-consult not working | Check GEMINI_AUTO_CONSULT=true |
Rate limiting | Adjust GEMINI_RATE_LIMIT (default: 2s) |
Container detection error | Ensure running on host system, not in Docker |
stdio connection issues | Check Claude Code MCP configuration |
HTTP connection refused | Verify port availability and firewall settings |
- API Credentials: Store securely, use environment variables
- Data Privacy: Be cautious about sending proprietary code
- Input Sanitization: Sanitize queries before sending
- Rate Limiting: Respect API limits (free tier: 60/min, 1000/day)
- Host-Based Architecture: Both Gemini CLI and MCP server run on host for auth compatibility
- Network Security: HTTP mode binds to 127.0.0.1 by default (not 0.0.0.0)
- Rate Limiting: Implement appropriate delays between calls
- Context Management: Keep context concise and relevant
- Error Handling: Always handle Gemini failures gracefully
- User Control: Allow users to disable auto-consultation
- Logging: Log consultations for debugging and analysis
- History Management: Periodically clear history to avoid context bloat
- Mode Selection: Use stdio for production, HTTP for testing
- Architecture Decisions: Get second opinions on design choices
- Security Reviews: Validate security implementations
- Performance Optimization: Compare optimization strategies
- Code Quality: Review complex algorithms or patterns
- Troubleshooting: Debug complex technical issues
- API Design: Validate REST/GraphQL/gRPC decisions
- Database Schema: Review data modeling choices