Gemini 2.0 Realtime WebSocket API Notes & Examples

These are some cleaned up notes that may help fill in gaps in the official docs, full real wire examples are nice ;)

Connection

Connect to the WebSocket endpoint:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContent?key=YOUR_API_KEY

Protocol Overview

The protocol follows a bidirectional communication pattern where:

Client establishes connection and sends setup message
Server acknowledges with setup completion
Client can then stream audio chunks or send text messages
Server responds with audio or text responses

All messages are JSON encoded.

Message Types

1. Initial Setup

Client must send this message immediately after connection:

{
  "setup": {
    "model": "models/gemini-2.0-flash-exp",
    "generationConfig": {
      "responseModalities": "audio",
      "speechConfig": {
        "voiceConfig": {
          "prebuiltVoiceConfig": {
            "voiceName": "Aoede"
          }
        }
      }
    },
    "systemInstruction": {
      "parts": [{"text": "You are my helpful assistant."}]
    }
  }
}

Server acknowledges with:

{
  "setupComplete": {}
}

2. Audio Streaming

Client Audio Input

Send one audio chunk per message (dunno why, alpha issue?)
Audio must be PCM format, 16kHz sample rate, 16-bit
Audio data must be base64 encoded

{
  "realtimeInput": {
    "mediaChunks": [
      {
        "mimeType": "audio/pcm;rate=16000",
        "data": "<base64_encoded_audio_data>"
      }
    ]
  }
}

Server Audio Response

Server responds with PCM audio at 24kHz sample rate:

{
  "serverContent": {
    "modelTurn": {
      "parts": [
        {
          "inlineData": {
            "mimeType": "audio/pcm;rate=24000",
            "data": "<base64_encoded_audio_data>"
          }
        }
      ]
    }
  }
}

3. Text Messages (Optional)

Client can also send text messages:

{
  "clientContent": {
    "turns": [
      {
        "role": "user",
        "parts": [{"text": "hello"}]
      }
    ],
    "turnComplete": true
  }
}

quartzjer/gemini_rt_websocket.md