Hey friends,
I'm looking for some advice.
Traditional web companies (Google, Twitter, Facebook) develop interesting infrastructure problems because they reach scaling challenges. Their current stack cannot support the amount of users they're seeing (one way or another), or their advanced feature set to support their product doesn't really work well in their stack, so they write more software to help handle these problems. This is cool! I like infrastructure problems!
I feel more and more at work that, even though we're growing still (ish), we're not growing so fast that we're running into scaling problems. There's no type of cluster at work that can't grow 10x or even 100x for some of them comfortably at this point (with a few caveats).* This makes it incredibly difficult to choose a new problem at work to work on, in my opinion. On top of this, we don't have great feedback loops with the product developers that develop on top of our stack. IMO this is because the stack is "good-ish enough" to ship features rapidly. They don't feel like they need anything more. (Which is not to say that there isn't something we could build that could enable faster development, or more confidence, etc., but it's not like people are CLAMMORING for it or something. Also, like, let's say we build something? How do we know we made things better? What's the feedback loop?)
So, how do you choose what you work on under these conditions? How do you choose what you work on without externalities like scaling? What the hell does the scaling team do when the company stops scaling?
I can reasonably think of a few things that would be interesting to work on for the next yearish or so (mostly in areas in which I think it's difficult to get work done today, like monitoring/logging/alerting and gaining confidence in production / debugging production), and maybe under normal circumstances that'd be sufficient, but I'm eyeing prioritizing getting a green card, which means sticking at the same employer for roughly 3 more years.
Also, maybe I should just choose the most interesting of the problems we have? Also, there are plenty of interesting and impactful non infrastructure problems. Maybe I should work on one of those?
Yours in career ugh,
Maggie
*There's a few that I can think of, but the solutions to them are already in flight.
At last, a famous zmagg gist that has been foretold to me!
This is a good concern and observation. Etsy's dataset (number of listings, number of users) is definitely not increasing faster than our current capacity to handle. We're currently limited in exploring the (1) space of product features that buyers want (and therefore that sellers want, to reach those buyers), (2) the space of search features to improve user experience, and (3) the space of high infrastructure performance impact to low touch ratio.
(1) For search, the time it takes to spin up a new index (like offerings-leo) is about 6 months, including the addition of a new feature like offerings. If the product team decides we need a new marketplace or a new index, our scaling problem is how do we manage multiple indexes / views on our listing data.
(2) If the data science / search ranking team wants to experiment with new ML models to produce more relevant search results / recommendations, our scaling problem is how do we support the configurable addition / querying / removal of these new ML fields.
(3) If we decide to shrink our infra team so that we can assign them to working on new projects rather than maintaining existing systems, our scaling problem becomes how do we manage a growing fleet on the cloud with fewer human engineers.
There's some relationship to sustainable capitalism here: Etsy in the past has not had the pressure to grow as fast as a public company normally is. As a private company serving a unique community, we've probably reached as large as we can, and therefore traditional infrastructure scaling (with data size) is no longer interesting. All the kinds of scaling above have to do with kinds of data (number and type of datasets), rather than the size of an existing dataset.
I have other thoughts about kinds of business models and other kinds of scaling for another time.