People often ask us about the technologies we use to power the spire.io platform. Our architecture is a big part of what makes us unique and ensures that your applications will be reliable, performant, and secure.
There are basically five major components involved:
-
Load-balancer and proxy
-
A dispatcher
-
A message transport
-
A collection of workers
-
And a data store
Each of these has different requirements and thus employs different technologies. Let's start where our request processing starts - with the load balancer and proxy.
This is really two things, but, conceptually, they are really doing one job. First, we want to distribute requests evenly across our servers. Second, we want to provide quick responses in some cases: for redirects, bad requests, or obvious denial-of-service (DOS) attacks. We can also do some course-grained routing. This is particularly helpful for setting up failover or for doing rolling deploys. Finally, we terminate SSL at this layer, freeing up the dispatcher from doing routine SSL handshaking.
We run our servers on Amazon Web Services and use their elastic load balancer (which is were we terminate SSL). AWS gives us data centers around the world, which is nice (and great for load testing). However, this is an area that is evolving rapidly and we are always re-evaluating cloud providers. We also use HA Proxy, which is ideal for dealing with high-levels of concurrency, as our proxy server.
And speaking of the dispatcher ...
One of the design goals for our platform is to make persistent connections cheap. We want to be able have millions and millions of connections open simultaneously and not have to worry about it from a cost standpoint.
This is useful for supporting HTTPS, because of the overhead associated with establishing an SSL session. We want clients to be able to keep a connection open so they can have the best of both worlds: performance and security. Although we terminate the SSL at the proxy layer, we still need to keep connections open on the dispatcher for long-running requests. This gives us true real-time responsiveness, even for applications with millions of clients.
Another design principle is that we never want to drop requests. So that means we have to grab them off the wire and persist them somehow as quickly as we can. We don't want to do any more than that so that we can handle as many requests and connections as possible per machine.
We wrote the dispatcher in Node for two reasons. One reason is that we already knew Javascript. Another is that it has very nice asynchronous HTTP libraries, which makes it easy for us to keep connections open. Third, it outperformed Ruby in our load testing, another language we're quite familiar with and which features support for asynchronous HTTP, for this particular task. Of course, it's still not as fast as, say, using C, but it's simple enough that it can be easily rewritten if the current implementation becomes a bottleneck.
We use Redis as our message transport. Messages are distributed across any number of Redis servers, allowing us to easily handle millions of them simultaneously. (By the way, we use lists and blocking operations rather than pub-sub.) The dispatcher constructs a task from a request and places the message containing the task into a worker queue. Which bring us to ...
Messages are handled by workers that specialize in performing a particular task. They perform the task and then post a message with a particular result. The dispatcher that was waiting for the specific result can then transform that into an HTTP response.
Workers are written in JRuby, which makes it easy for us to make changes quickly and spin up multiple worker threads - one for each CPU. High-volume workers may eventually be rewritten in C.
We also use Redis as our data store. It's data structure-oriented approach provides a great deal of flexibility over other NoSQL data stores and relational databases. For example, we use SortedSets to make message retrieval (for application messages, not our internal messages) simple and fast.
This architecture is highly elastic. It's easy to simply add servers wherever we have bottlenecks. And it's easy to add new features - we can just add new types of workers. Finally, it's easy to optimize any given component because they are decoupled from one another - all interaction is done via the message transport, which is language neutral. For example, we could rewrite the dispatcher in C or a particularly high-volume worker in Node, or even Lua, and it wouldn't effect anything else.
We have put a great deal of thought into how to handle security effectively in an API that lives out in the wild and wooly World-Wide Web. In particular, we use HTTPS for all API requests and capability-security - fine-grained authorization-based keys, rather than course-grain authentication-based keys - which is both more flexible and more secure at the same time. We will be introducing other APIs that build on this approach in the near future.
We have a lot of cool things coming down the road, including new APIs and enhancements to our Messaging API (including Web Sockets support). Follow us on Twitter - @spireio - or subscribe to our blog to keep up-to-date.
should be: