file.textile

Sorry for this fairly weird communications style, but I didn’t know how to contact you reliably otherwise.

My email is [email protected] if you’d like to respond that way.

I thought about our exchange all night, could hardly sleep. It’s always disconcerting when people disagree with me in ways that I don’t understand.

I think I may now understand where we see things differently. I’m curious to see what you think of this.

Ezra’s quote which you said criticized Github’s code is this:

“Funny thing is that we told github that gfs would not scale for them over a year ago, we also outlined how to move to a shared nothing chunk server architecture. They didn’t take our advice so it’s mostly their own architecture decisions that were holding them back with regards to gfs.”

Now, that clearly doesn’t criticize their code.

However, we also disagree on GFS:

I say:

“We chose GFS to guarantee identical code and page cache on all nodes with identical semantics to local FS. It works well for that.”

You respond:

“GFS is indefensible, it’s HUGE scaling pain in order to buy yourself page caching. It’s just not needed, ever.”

As I mentioned in at least one response, I agree that GFS is a scaling problem. I disagree that it’s more of a scaling problem than anything else shared, but shared it is.

So now we get to, perhaps, how we see things differently. I can understand all your points if you think that 3 years ago we tried to develop the ultimate scaling solution every Rails application. We did not. We tried to develop the ultimate scaling solution for any Rails application. At the time, the vast majority of those applications were generally written on a Mac laptop and deployed to 1 or 2 servers at most (the second being a separate DB machine).

Many of those applications wrote some state to local disk. We knew this at the time because we had spent about 15 months consulting on Rails application development and had seen this first hand. So we needed a way to allow common disk access between nodes. We didn’t want to use NFS because it’s not fully POSIX compliant, so not all applications would behave identically to their development environment. Additionally, we’d need to allocate separate hardware for file serving, which would have complicated costing and/or pricing.

We were extremely interested in allowing customers who wrote a Rails application, deployed it, and hit a bottleneck to immediately transition to Engine Yard and scale across multiple machines. It’s super important to remember that even today, after loads of optimization, most applications spend the majority of their time in rendering. 3 years ago this was true in spades!

So, GFS allows us to run those Rails applications as is, without change. We tell customers that GFS is and always will be a bottleneck that will need to be architected away.

In fact, Github is a very good example of why GFS was important to us. When launched, and likely still today, Github shells out to Git. When Github moved to us, it was running on their laptops, and maybe a single box elsewhere. GFS allowed them to scale their application a LONG WAY, all the while developing the application and leaving their competitors behind.

And, as Ezra pointed out, we saw the handwriting on the wall. GFS was going to limit them at their growth rates, so we suggested they begin that architectural change. They chose not to, and are instead going to Rackspace to continue scaling their application.

Chris mentioned on Hacker News that they have a new architecture, so perhaps they’re making two changes at once. I think it’s possible that they’re still relying on a shared filesystem, likely NFS, but they may have eliminated that entirely. I hope they did, as it will serve them and their users, EY included, better if so.

At the end of the day, I’d just like you to know how we arrived at our decisions. We weren’t insane, and understood the choices we were making. For the most part, those choices have proven out well, even if less than perfect. Without asking customers to rewrite parts of their applications, I still don’t see a better way of handling this. Yet, as I mentioned on Twitter, our cloud product does not use GFS. There are many reasons for this, including technical limitations of EC2 and EBS. Most importantly, the world has moved on and the need for GFS has lessened, so the balance tipped against it.

Engine Yard was started by a bunch of hard working techies trying to get a business off the ground by solving a very specific need, and we had minimum resources to do so. We knew that many of our customers would be in the same boat, so making the transition from minimum scale to medium scale easy and predictable for them was tantamount. Achieving that involved the trade off that is GFS, but I’d appreciate it if you saw it for what it was, not something to detest and be angry about. :-)

Peace?

tekkub/file.textile