Skip to content

Instantly share code, notes, and snippets.

@quartzjer
Created December 12, 2024 01:25
Show Gist options
  • Save quartzjer/9636066e96b4f904162df706210770e4 to your computer and use it in GitHub Desktop.
Save quartzjer/9636066e96b4f904162df706210770e4 to your computer and use it in GitHub Desktop.
Gemini 2.0 Realtime WebSocket API Notes & Examples

These are some cleaned up notes that may help fill in gaps in the official docs, full real wire examples are nice ;)

Connection

Connect to the WebSocket endpoint:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContent?key=YOUR_API_KEY

Protocol Overview

The protocol follows a bidirectional communication pattern where:

  1. Client establishes connection and sends setup message
  2. Server acknowledges with setup completion
  3. Client can then stream audio chunks or send text messages
  4. Server responds with audio or text responses

All messages are JSON encoded.

Message Types

1. Initial Setup

Client must send this message immediately after connection:

{
  "setup": {
    "model": "models/gemini-2.0-flash-exp",
    "generationConfig": {
      "responseModalities": "audio",
      "speechConfig": {
        "voiceConfig": {
          "prebuiltVoiceConfig": {
            "voiceName": "Aoede"
          }
        }
      }
    },
    "systemInstruction": {
      "parts": [{"text": "You are my helpful assistant."}]
    }
  }
}

Server acknowledges with:

{
  "setupComplete": {}
}

2. Audio Streaming

Client Audio Input

  • Send one audio chunk per message (dunno why, alpha issue?)
  • Audio must be PCM format, 16kHz sample rate, 16-bit
  • Audio data must be base64 encoded
{
  "realtimeInput": {
    "mediaChunks": [
      {
        "mimeType": "audio/pcm;rate=16000",
        "data": "<base64_encoded_audio_data>"
      }
    ]
  }
}

Server Audio Response

Server responds with PCM audio at 24kHz sample rate:

{
  "serverContent": {
    "modelTurn": {
      "parts": [
        {
          "inlineData": {
            "mimeType": "audio/pcm;rate=24000",
            "data": "<base64_encoded_audio_data>"
          }
        }
      ]
    }
  }
}

3. Text Messages (Optional)

Client can also send text messages:

{
  "clientContent": {
    "turns": [
      {
        "role": "user",
        "parts": [{"text": "hello"}]
      }
    ],
    "turnComplete": true
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment