The goal of this gist is to explain and teach developers how to write a working Gateway client. Please see the Table of Contents for sections.
Discord has two kinds of APIs:
- a RESTful API, which is entirely web-based - compromised of HTTP endpoints and request conditions.
- a Gateway API, heavily WebSocket-based - used for real-time communication of events happening from within the client.
When a user or bot logs into Discord, they have to connect to what's referred to as the "Gateway." This server is Discord's form of real-time communication, responsible for making sure events ongoing from within are dispatched at the right place and at the right time.
I've already created a couple of tutorials explaining the design of a Gateway client:
- Sending data to bots in real-time (Displace)
- retux - Gateway Operations (YouTube)
Payloads are responsible for standardising how data is structured and sent in the Gateway. These can be used in one of two scenarios:
- You're trying to properly connect to Discord's Gateway, known as send events.
- You're wanting to handle data received after a successful connection, known as receive events.
The following represents how payloads are structured:
Field | Type | Description |
---|---|---|
op |
integer | Denotes the operation code of the payload, i.e. action vs. event. |
d |
?mixed (any JSON value) | Represents data received, which can be of any type with a JSON value. |
s |
?integer * | Represents the sequence number of an event, used for helping repeat one in the event of a disconnect. |
t |
?string * | Denotes the event name, most often used for dealing with dispatching. |
* null
value when the payload is for a receive event.
Operation codes, also known as "opcodes," are identifiers for specifying what kind of data is going to be returned. Much like payloads, opcodes can be split up between send and receive-based events.
The following represents valid Gateway opcodes. Only the most important ones are shown:
Code | Name | Action | Description |
---|---|---|---|
0 | Dispatch * | Receive | An event has occured with the selected intents. |
1 | Heartbeat | Send/Receive | Fired periodically by a client to keep the Gateway connection alive. |
2 | Identify | Send | Starts a new session within the connection by making a handshake for identification. |
6 | Resume | Send | Resumes a previous session in a new connection that was ended by the Gateway. |
7 | Reconnect | Receive | The connection's session is recommended to be reconnected and resumed to. |
9 | Invalid Session | Receive | The connection's session has been invalidated by the Gateway, a new connection is needed. |
10 | Hello | Receive | Sent upon starting a new connection to the Gateway. |
11 | Heartbeat ACK | Receive | A heartbeat that was sent by the client has been validated by the Gateway. |
* Represents a large majority of receive events.
Discord's transport layer is done by sending packets through a WebSocket server. The Gateway utilise packets alongside status codes in order to regulate data passage.
Packets are sent through Gateway events as a serialised JSON string.
Because the WebSocket protocol is very much bound to the same disconnect handling as HTTP/1.1, Discord use custom status codes in their Gateway to help represent what a closing state (closure) was produced by.
Some status codes allow for a reconnect, meaning the same session can be used to resume.
The following represents all valid closures:
Code | Name | Explanation | Reconnect? |
---|---|---|---|
4000 | Unknown error | We're not sure what went wrong. Try reconnecting? | true |
4001 | Unknown opcode | You sent an invalid Gateway opcode or an invalid payload for an opcode. Don't do that! | true |
4002 | Decode error | You sent an invalid payload to Discord. Don't do that! | true |
4003 | Not authenticated | You sent us a payload prior to identifying. | true |
4004 | Authentication failed | The account token sent with your identify payload is incorrect. | false |
4005 | Already authenticated | You sent more than one identify payload. Don't do that! | true |
4007 | Invalid seq | The sequence sent when resuming the session was invalid. Reconnect and start a new session. | true |
4008 | Rate limited | Woah nelly! You're sending payloads to us too quickly. Slow it down! You will be disconnected on receiving this. | true |
4009 | Session timed out | Your session timed out. Reconnect and start a new one. | true |
4010 | Invalid shard | You sent us an invalid shard when identifying. | false |
4011 | Sharding required | The session would have handled too many guilds - you are required to shard your connection in order to connect. | false |
4012 | Invalid API version | You sent an invalid version for the gateway. | false |
4013 | Invalid intent(s) | You sent an invalid intent for a Gateway Intent. You may have incorrectly calculated the bitwise value. | false |
4014 | Disallowed intent(s) | You sent a disallowed intent for a Gateway Intent. You may have tried to specify an intent that you have not enabled or are not approved for. | false |
Gateway events are structured forms of a Gateway payload, based off of an operation code.
In an instance where 0
becomes the opcode, the data field will represent information about an event that happened within
a server or the application's DMs.
Events can be determined by the payload's event name field. When matched, names of events will be written in full upper
snake case, i.e. MESSAGE_CREATE
and etc.
The Gateway itself is an entagled, complicated subject. For a quick rundown of the structure, here's a diagram to visualise how data is interacted with itself, the client and server:
---
title: Gateway Structure
---
classDiagram
direction RL
note "This diagram is speculative to the Gateway\nlifecycle found in the Developer Documentation."
class Packet {
JSON serialised string
}
class Payload {
+str op
+typing.Any d
+int s
+str t
}
note for Payload "op - Operation code\nd - Payload data\ns - Sequence number\nt - Payload event name/type"
class OpCode {
Operation codes denote an action or payload.
}
Payload *-- OpCode
Packet --> Payload
Packet "1" --> "*" Payload
Payload "*" --> "many" Packet : Inherits
class State {
+int version
+str encoding
+str? compress
+float? heartbeat_interval
+str? resume_gateway_url
+str? session_id
+int? seq
}
note for State "version - Version of the Gateway/WebSocket API\nencoding - How we want data to be transported as.\ncompress - Compress smaller data?\nheartbeat_interval - Periodic time to send heartbeats\nresume_gateway_url - URL for resuming connections\nsession_id - Unique ID of the connection session\nseq - Related to payload sequence"
class Protocol {
<<interface>>
}
class Client {
+str token
+State state
+connect()
+reconnect()
+_error()
+_track()
+_dispatch()
+_identify()
+_resume()
+_heartbeat()
}
note for Client "token - Bot token\nstate - Connection state data\nconnect() - Starts a new connection\n_error() - Handles error returns\n_track() - Tracks and handles operation codes"
note for Client "_error()\nErrors are thrown as exceptions, but will be caught\nand met with a reconnection attempt. If this fails,\na new connection is formed."
note for Client "_track()\nOperation codes have different procedues, which will\nchange the way we have to handle data. Most\noperation codes will be for send events."
note for Client "_dispatch()\nDispatching is when we have a receive event intended\nto represent an action within the app. An\n example of this is message creation."
note for Client "_heartbeat()\nHeartbeats are what is needed to keep a connection\nalive. Most heartbeats usually timed around\n42-45 seconds."
State *-- Client
Client ..> Protocol
Protocol --> "many" Client : Inherits
class HEARTBEAT {
+int heartbeat_interval
}
HEARTBEAT o-- Client
HEARTBEAT o-- Protocol
HEARTBEAT o-- State
note for HEARTBEAT "A concurrently timed payload sent to keep a connection alive."
class HELLO {
+int heartbeat_interval
HELLO helps determine the interval for heartbeats to be sent.
}
HEARTBEAT ..> HELLO
HEARTBEAT .. State
class IDENTIFY {
+str token
+dict properties
}
note for IDENTIFY "Used to authenticate with the Gateway who you are."
class RESUME {
+str session_id
+int seq
}
note for RESUME "Used to resume a dead connection.\nIf INVALID_SESSION comes after this, a new connection\isrequired."
IDENTIFY ..|> RESUME
class INVALID_SESSION {
The current session has been invalidated.
Create a new connection from here.
}
class RECONNECT {
+bool d
}
note for RECONNECT "Inner boolean data field is for whether we can reconnect\nor not. If true, send RESUME.\nIf not, start a new connection."
RECONNECT --> INVALID_SESSION
RECONNECT .. RESUME
RECONNECT .. IDENTIFY
For the purposes of making things easy for developers to understand, I will be implementing the Gateway client with the following in mind:
- We will be using Python as our solution due to its simplicity and how easy it is to understand.
- Some non-popular dependencies will be used to grossly simplify operations
trio
- A concurrent and I/O friendly asynchronous runtime for Pythontrio_websocket
- Awsproto
implementation of WebSockets fortrio
.attrs
- Dataclasses with more granularity and finer control.cattrs
- A de/serialiser suite package for mappings toattrs
-styled objects, and vice versa.
- Only the most important qualities of the Gateway client will be added.
- There will be no code for send events such as
VOICE_STATE_UPDATE
andREQUEST_GUILD_MEMBERS
. - The implementation will roughly follow the documentation with some changes that aren't fully covered by it.
- There will be no code for send events such as
So, where do we begin?
First, we need to establish opcodes and intents. The full list of Intents can be quite exhaustive, so we'll just keep a few to make things easier to understand.
import enum
class OpCode(enum.IntEnum):
DISPATCH = 0
HEARTBEAT = 1
IDENTIFY = 2
RESUME = 6
RECONNECT = 7
INVALID_SESSION = 9
HELLO = 10
HEARTBEAT_ACK = 11
Our intents will be a list of integer flags, where bitwise calculation is performed by the client to determine what events the application's allowed access to. We'll explain more about why we need these later on.
class Intents(enum.IntFlag):
GUILDS = 1 << 0
GUILD_MESSAGES = 1 << 9
GUILD_MESSAGE_TYPING = 1 << 11
With our enumerators made, we can now move forward with creating the payload model. We'll be using attrs
for this,
which will make our life easier with mapping to a dataclass:
import attrs
import typing
@attrs.define(kw_only=True) # We want full mapping usage in case some fields are missing a value.
class Payload:
op: OpCode
d: typing.Any | None
s: int | None
t: str | None
# We'll create a few property methods so it's easier to understand these abbreviations.
@property
def data(self) -> typing.Any | None:
return self.d
@property
def sequence(self) -> int | None:
return self.s
@property
def name(self) -> str | None:
return self.t
Interfaces are really unnecesary for Python mainly because there's no real "interface" solution provided natively as
a builtin, so instead, we're going to use the next best thing: protocols. These are part of the builtin typing
dependency
Python provides, and offers static duck typing. Consider the following where we want to define how methods will
act within our solution:
import typing
class GatewayProto(typing.Protocol):
# These are class attributes we're defining.
_token: str
intents: Intents
_state: State
def __init__(self: typing.Self, token: str, intents: Intents) -> None:
...
async def __aenter__(self: typing.Self) -> typing.Self:
...
async def __aexit__(self: typing.Self) -> Exception:
...
When we make a new class that will be our Gateway client, if we were to subclass GatewayProto
, that class will now expect
these methods to be written with the same argument signature, type return, call convention and naming strategy. Failure to
do this will result in a raised exception from the protocol.
We have a majority of the important things done. Now we can build our Gateway client solution. But in order to make things
further easier for us to understand, we're going to create an abstract class called State
that stores some metadata found
from the Gateway pre/post-connection wise:
@attrs.define(slots=False) # we don't want to make this read-only, we need to write to this later on.
class State:
version: int = 10 # Version of the Gateway/WebSocket API
encoding: str = "json" # How we want data to be transported as.
compress: str | None = None # If we want to compress data for memory sizing, but JSON doesn't really need it.
heartbeat_interval: float | None = None
resume_gateway_url: str | None = None
session_id: str | None = None
seq: int | None = None
We will explain the other attributes in the State
class while we slowly build the client. For now, let's go ahead and start
with a very basic implementation of connecting.
import trio
import trio_websocket
class Gateway(GatewayProto):
_tasks: trio.Nursery = None
def __init__(self: typing.Self, token: str, intents: Intents) -> None:
self._token = token
self.intents = intents
self._state = State()
# This is how we define an asynchronous context manager.
# The calling convention for this will be "async with," where our whole event loop will be
# handled asynchronously. If an exception is thrown, the manager will quietly end with the
# traceback returned.
async def __aenter__(self: typing.Self) -> typing.Self:
# A "nursery" is like a group of scheduled tasks in asyncio. Trio believes in
# explicit definition of tasks - there is no hidden control flow happening in the background,
# everything is defined and handled at the time of creation.
self._tasks = trio.open_nursery()
nursery = await self._tasks.__aenter__() # we have to asynchronously enter our nursery.
return self
async def __aexit__(self: typing.Self, *exc) -> Exception:
return await self._tasks.__aexit__(*exc)
This is not a lot of code, but it can appear a little bit complex for some people. Let's break this down method-by-method:
- We have
__init__
to initialise our class and write some values to some attributes.- We want to store the
token
(imagine it like the private password of a bot) andintents
needed to connect. - We initialise a
State
within our intiailiser so we can easily reference and write to it later. - We define
_tasks
as our own class attribute separate from the protocol to handle important methods to call at certain times.
- We want to store the
- We define an asynchronous context manager,
__aenter__
so we can concurrently handle an event loop-based connection that can quietly exit without necessarily throwing an exception during the runtime state.open_nursery()
produces aNursery
managed object, similar toasyncio
's grouped tasks.
- We also define another async context manager,
__aexit__
that handles proper closing of the manager in the event of an exception.
With the context manager now in place, we can write the official connect method. We'll have to adjust a few areas, of course to make this work.
First, let's start with writing the protocol for how the method should act.
class GatewayProto(typing.Protocol):
...
async def connect(self: typing.Self) -> None:
...
Next, let's add a new class attribute, _conn
which will represent trio_websocket.WebSocketConnection
. This
will be a context manager of its own kind that we'll relay through a trio.Nursery
when we aynschronously enter
the Gateway client:
class Gateway(GatewayProto):
_conn: trio_websocket.WebSocketConnection = None
async def connect(self: typing.Self) -> None:
# We need to "resolve" which URL we're using. The Gateway can provide us two in different cases:
# - If we already had a connection, it'll supply us a region-based URL path to make connections quicker and
# more sane.
# - If we're starting a new connection, we'll need to use the entry point URL.
resolved_url: str = self._state.resume_gateway_url or "wss://gateway.discord.gg"
# This has a lot of string formatting to apply query-string parameters into the URL path.
# It's recommended for Gateway clients to specify which version of the API they'll be using.
# For the sake of making this implementation simple, we'll be encoding through JSON.
async with trio_websocket.open_websocket_url(
f"{resolved_url}/?v={self._state.version}&encoding={self._state.encoding}"
f"{'&compress=' + self._state.compress or ''}
) as self._conn:
while not self._conn.closed:
print(await self._conn.get_message())
Finally, we'll need to start a new task in our Nursery
so it's called upon asynchronously entering
the client:
class Gateway(GatewayProto):
async def __aenter__(self: typing.Self) -> typing.Self:
...
nursery.start_soon(self.connect)
return self
After this, give it a run. If right, we should have a very brief connection. And for the short period it's alive,
we should now be receiving a JSON-serialised string resembling a HELLO
payload. When formatted, it should look
something like this:
{
"op": 10,
"d": {
"heartbeat_interval": 42.5
}
}
Instead of receiving everything as a string and parsing it from there, we could take a few other routes. Let's consider the following:
- We can use the builtin
json
dependency to transform a JSON-serialised string into adict
object.- One drawback is that with
dict
, we could lose access to methods we might want to define for it to help ourselves.
- One drawback is that with
- We can utilise the
Payload
dataclass created earlier to handle value representation.- We can define our own methods to help make it easier.
The answer is pretty clear: we're using the Payload
dataclass. But to make Python easily transform a JSON-serialised
string into an attrs
-decorated class, we'll need to use another dependency, cattrs
.
Let's create a new private method, _receive()
that will handle doing this all for us:
from aiohttp import WSMessage # we're only importing this for typing and nothing else.
import cattrs
import json
class Gateway(GatewayProto):
...
async def _receive(self: typing.Self) -> Payload:
# Payloads are sent as a form of a packet, and are not sent as frames.
# Some closures may be treated as a frame, so we'll be using an exception catching block.
try:
# When we use get_message(), it'll act as a blocking call until a message has been received.
# This is useful for concurrently firing this method only when data has arrived.
packet: WSMessage = await self._conn.get_message()
json: dict = json.loads(packet)
return cattrs.structure(json, Payload)
except trio_websocket.ConnectionClosed as exc:
return exc
Now we can refactor our connect()
method to give us a better structure of data. We can use the asdict()
method
from attrs
to also easily convert it back into a dict
object:
class Gateway(GatewayProto):
...
async def connect(self: typing.Self) -> None:
...
async with trio_websocket.open_websocket_url(...) as self._conn:
while not self._conn.closed:
payload: Payload = await self._receive()
if payload:
print(payload)
The next major step towards making the Gateway client work is to track operation codes. Why? Well, we have some
send events that we need to send at the right moments. Most importantly, HEARTBEAT
and IDENTIFY
are necessary
to implement.
Let's start by creating a new ._track()
method that handles all necessary operation codes and how we'll handle them.
Some pseudomethods will be written which we'll later implement:
class Gateway(GatewayProto):
...
# We'll add a new state boolean for when we can start sending heartbeats.
# This is necessary since trio tasks in a Nursery start after declaration,
# but we don't want to do it as soon as the context manager begins.
_send_heartbeats: bool = False
async def connect(self: typing.Self) -> None:
...
trio_websocket.open_websocket_url(...) as self._conn:
while not self._conn.closed:
payload: Payload = await self._receive()
if payload:
self._track(payload)
async def _track(self: typing.Self, packet: Payload) -> None:
# Discord recommend for us to always update our sequence from our last
# given payload. This is important for two reasons:
#
# 1. If we drop the connection and are able to resume it, the sequence helps
# us repeat with dispatching the last sent event we didn't receive.
# 2. Having an accurate sequence helps the Gateway know which previous payloads
# have already been sent, keeping us up-to-date and accurate in the connection
# loop.
self._state.seq = packet.sequence
# Let's track each operation code and add matching cases, allowing us to break
# up handling into different levels.
match packet.op:
# HELLO is the first event we'll ever receive from the Gateway.
# This event is responsible for telling us our heartbeat_interval value, needed to
# send accurate heartbeats.
case OpCode.HELLO:
# Now we need to check for two conditions:
# - If we had an existing past connection, we'll try resuming it.
# - If we have no past connection, we'll identify ourselves.
if self._state.session_id:
await self._resume()
else:
await self._identify()
self._state.heartbeat_interval = packet.data["heartbeat_interval"] / 1000 # ms to seconds
self._send_heartbeats = True