Skip to content

Instantly share code, notes, and snippets.

@pesterhazy
Last active December 7, 2024 22:13
Show Gist options
  • Save pesterhazy/5a063f862a8cd7cea93fb2e1e9b35218 to your computer and use it in GitHub Desktop.
Save pesterhazy/5a063f862a8cd7cea93fb2e1e9b35218 to your computer and use it in GitHub Desktop.
Websocket Deployments — Enemy of the State

Websocket Deployments — Enemy of the State

How do you deploy new code continuously into a system that needs to support stateful, uninterruptible computations?

Some workloads are inherently stateful

As web developers, we're used to valuing stateless computation, for good reason – reducing state makes everything easier. Servers (or cloud instances) hold no essential state, delegating any important data to databases like Postgres and Redis. This is a great architecture. Among other benefits, it makes zero-downtime deployments a breeze: to deploy a new code version, you simply spin up a set of new instances with the new git sha, add them to the load balancer, and kill the old instances after draining them of traffic (perhaps after 60 s).

However, some workloads don't fit this "short-duration request & response" mold. As an example, consider connecting OpenAI Realtime to Twilio's API, which requires you to:

  • accept a websocket connection from Twilio
  • listen to messages on the websocket, which contain bits of audio
  • after making some optional changes, relay the bits of audio on to OpenAI Realtime, again via a websocket
  • keep doing this for the duration of the call, which could last up to 30 minutes

Note that we're dealing with two pieces of essential state which can't be handed off to a database:

  1. An incoming websocket connection from Twilio
  2. An outgoing websocket connection to OpenAI

If we drop either of those connections due to a server restart, we inevitably drop the ongoing call and end up with a bad customer experience.

How do we square continuous deployment with stateful services?

We now have two desiderata:

  1. We want to be able to deploy as often as necessary. In a busy monolith, we might deploy new code 10 times per hour, or more.

  2. We need to keep a stateful process active for up to 30 minute which cannot be interrupted.

How can you make both of these possible at once? The problem, of course, is that when a new code version is deployed, the server is typically stopped and restarted, or a new cloud instance is created which cannot inherit open TCP connections from the old instance. I think this is an interesting engineering challenge.

I'd like to hear from you

Like any engineer, when I hear a problem like this, my mind starts coming up with possible solutions. (Would AWS lambda work for this use case? It might help with the incoming websocket, but I think it won't be able to maintain the outgoing websocket connection.) But instead of writing up my half-baked thoughts, I'm curious what you think. Let me know in the comments below!

@jbmoelker
Copy link

Fascinating challenge. I had help from an LLM phrasing this response. Out of the many options it suggested I believe this paradigm fits best with your situation:

Decouple computation and connection:

  • Use a message broker (e.g., RabbitMQ, Kafka, or Redis streams) to buffer and relay data between WebSocket clients and backend workers.
  • Workers handle stateful computations and can be restarted independently of the WebSocket connections.
  • When new code is deployed, old workers finish their ongoing tasks, while new workers start handling new connections.

I mostly work with serverless setups using edge workers (Cloudflare, and previously AWS Lambda). So your preference may be different. I would use Cloudflare Workers for the computation and separate Cloudflare Durable Objects for the WebSocket connections. This would at least let you update the workers without interrupting connections. If you do need to update the connection layer as well, this would still cause connections to be interrupted.

Cloudflare docs: WebSockets disconnection
Code updates will disconnect all WebSockets. If you deploy a new version of a Worker, every Durable Object is restarted. Any connections to old Durable Objects will be disconnected.

The LLM suggested this alternative:

Connection Draining
Implement connection draining logic in the server:

  • Graceful Termination: During deployment, keep the old instance alive and only accept new WebSocket connections on the new instance.

I wouldn't know where to start provisioning this in setups I'm comfortable with. Instead I would write the progress of all connections to a database. So on a code update of the message layer, it would check all unfinished connections in the database and restart them. Easiest I could think of. Curious what you end up using and what fits your workflow.

@pesterhazy
Copy link
Author

Thanks @jbmoelker! On the "Decouple" approach, I can see how that would deal with incoming websocket connection, but the use case includes an outgoing websocket as well (to talk to OpenAI). Is it possible to maintain an outgoing connection in a stateless, serverless way?

@jbmoelker
Copy link

I haven't done this exact thing. But I would think I would bind a Cloudflare Durable Object to both Twillio and OpenAI and have that work with a Cloudflare Worker for computation. So as long as that process is running that Worker would stay alive and so would the connections to Twillio and OpenAI. But I would really need to test this 😊

@pesterhazy
Copy link
Author

On Connection Draining, I think this could be a good approach. I'm mostly familiar with AWS load balancers and its ECS service, which has a deregistration_delay.timeout_seconds property.

deregistration_delay.timeout_seconds
The amount of time for Elastic Load Balancing to wait before deregistering a target. The range is 0–3600 seconds. The default value is 300 seconds.

So the idea would be

  • set a maximum number of minutes for a process (e.g. 30 min as maximum call length)
  • set the load balancer to continue relaying traffic on the tcp connection for up to that timeout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment