Skip to content

Instantly share code, notes, and snippets.

@diegoeche
Created March 18, 2026 00:29
Show Gist options
  • Select an option

  • Save diegoeche/12ddd0c6069e846abbd5aa4afbd843d1 to your computer and use it in GitHub Desktop.

Select an option

Save diegoeche/12ddd0c6069e846abbd5aa4afbd843d1 to your computer and use it in GitHub Desktop.

Why Was The Widget So Slow?

The Restaurant Analogy

Imagine a restaurant with 4 waiters (that's Kestrel's thread pool).

The old code worked like this: a waiter takes your order, walks to the kitchen, and stands there staring at the chef until the food is ready. They don't take any other orders while waiting. If 4 tables order at the same time, every waiter is standing in the kitchen doing nothing, and the 5th table? They can't even get a glass of water. That's thread starvation.

The fix: the waiter drops off the order, goes back to serve other tables, and picks up the food when the kitchen rings the bell. That's await.

But It Worked Fine Before!

On ECS, the app ran on IIS which is basically a restaurant with 400 waiters. You can afford to have most of them standing around in the kitchen — there are always more available. Nobody noticed the inefficiency because brute force solved it.

Kestrel (k8s) is designed to run with 4 very efficient waiters. It assumes they won't stand around waiting. When someone writes db.Reviews.ToList() instead of await db.Reviews.ToListAsync(), you've just sent a waiter to stare at the kitchen.

The Greatest Hits

The Badges Endpoint had a foreach loop over products. For each product it would:

  1. Query the database for reviews (waiter goes to kitchen, stares)
  2. Query the database for questions (waiter goes to kitchen again, stares again)
  3. Read from Redis (waiter walks to the bar, stares)
  4. Write to Redis (walks back to bar, stares some more)

16 products × 4 blocking calls = 64 waiters standing around doing nothing. With 4 available, you do the math.

We replaced this with 2 bulk queries upfront, then filtered in memory. 9.85s → 3.39s.

The Widget Endpoint had 7 separate SELECT COUNT(*) queries to get the star rating breakdown — one for each star rating plus recommend counts. That's 7 round trips to the kitchen when you could just ask "give me all the counts grouped by rating" once.

It also had the usual AsyncHelper.RunSync pattern everywhere — which is the coding equivalent of calling the kitchen on the phone, putting yourself on hold, and blocking the phone line so nobody else can use it. In an async method. Inside a restaurant that only has 4 phone lines.

The SQS Surprise

The review rendering loop calls SubmitReviewForCensorProcess for every uncensored review. This sends a synchronous AWS SQS message per review. Each one is a network round trip to AWS. For a shop with 16 reviews displayed, that's 16 sequential API calls to Amazon during a widget page load. In the render loop. For a read-only endpoint. Someone really wanted those reviews censored right now.

What We Fixed

What Before After
Badges N+1 queries 2 DB queries per product 2 queries total
Badges 16 products 9.85s 3.39s
Widget rating counts 7 separate COUNTs 1 GroupBy
Widget sku_only path N+1 loop + in-memory queryable that crashes with async Single IN query, proper IQueryable
Blocking calls AsyncHelper.RunSync everywhere await everywhere
Redis Hardcoded ElastiCache, sync calls, legacy HGET fallback Configurable via env vars, async, no HGET
Error handling Silent catch {} swallowing everything Report to Sentry
Debug noise ~90 lines of Stopwatch/Debugging/Console.WriteLine Deleted

Active Requests: Before vs After

Before the async fix, pods accumulated 2000+ active connections and health checks timed out. After:

  • 8 pods handling 15% of production traffic
  • 6-20 active requests per pod (was 2000+)
  • Zero restarts
  • Thread pool: ~10 threads per pod (healthy)

We bumped to 30% traffic and it's holding steady.

Still TODO

  • The per-review SQS censor call should be batched or moved out of the render path
  • GetProductGroupIds2 opens its own DbContext per call
  • More sync callers exist outside the widget/badges hot path

PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment