Imagine a restaurant with 4 waiters (that's Kestrel's thread pool).
The old code worked like this: a waiter takes your order, walks to the kitchen, and stands there staring at the chef until the food is ready. They don't take any other orders while waiting. If 4 tables order at the same time, every waiter is standing in the kitchen doing nothing, and the 5th table? They can't even get a glass of water. That's thread starvation.
The fix: the waiter drops off the order, goes back to serve other tables, and picks up the food when the kitchen rings the bell. That's await.
On ECS, the app ran on IIS which is basically a restaurant with 400 waiters. You can afford to have most of them standing around in the kitchen — there are always more available. Nobody noticed the inefficiency because brute force solved it.
Kestrel (k8s) is designed to run with 4 very efficient waiters. It assumes they won't stand around waiting. When someone writes db.Reviews.ToList() instead of await db.Reviews.ToListAsync(), you've just sent a waiter to stare at the kitchen.
The Badges Endpoint had a foreach loop over products. For each product it would:
- Query the database for reviews (waiter goes to kitchen, stares)
- Query the database for questions (waiter goes to kitchen again, stares again)
- Read from Redis (waiter walks to the bar, stares)
- Write to Redis (walks back to bar, stares some more)
16 products × 4 blocking calls = 64 waiters standing around doing nothing. With 4 available, you do the math.
We replaced this with 2 bulk queries upfront, then filtered in memory. 9.85s → 3.39s.
The Widget Endpoint had 7 separate SELECT COUNT(*) queries to get the star rating breakdown — one for each star rating plus recommend counts. That's 7 round trips to the kitchen when you could just ask "give me all the counts grouped by rating" once.
It also had the usual AsyncHelper.RunSync pattern everywhere — which is the coding equivalent of calling the kitchen on the phone, putting yourself on hold, and blocking the phone line so nobody else can use it. In an async method. Inside a restaurant that only has 4 phone lines.
The review rendering loop calls SubmitReviewForCensorProcess for every uncensored review. This sends a synchronous AWS SQS message per review. Each one is a network round trip to AWS. For a shop with 16 reviews displayed, that's 16 sequential API calls to Amazon during a widget page load. In the render loop. For a read-only endpoint. Someone really wanted those reviews censored right now.
| What | Before | After |
|---|---|---|
| Badges N+1 queries | 2 DB queries per product | 2 queries total |
| Badges 16 products | 9.85s | 3.39s |
| Widget rating counts | 7 separate COUNTs | 1 GroupBy |
| Widget sku_only path | N+1 loop + in-memory queryable that crashes with async | Single IN query, proper IQueryable |
| Blocking calls | AsyncHelper.RunSync everywhere |
await everywhere |
| Redis | Hardcoded ElastiCache, sync calls, legacy HGET fallback | Configurable via env vars, async, no HGET |
| Error handling | Silent catch {} swallowing everything |
Report to Sentry |
| Debug noise | ~90 lines of Stopwatch/Debugging/Console.WriteLine |
Deleted |
Before the async fix, pods accumulated 2000+ active connections and health checks timed out. After:
- 8 pods handling 15% of production traffic
- 6-20 active requests per pod (was 2000+)
- Zero restarts
- Thread pool: ~10 threads per pod (healthy)
We bumped to 30% traffic and it's holding steady.
- The per-review SQS censor call should be batched or moved out of the render path
GetProductGroupIds2opens its ownDbContextper call- More sync callers exist outside the widget/badges hot path