Quimby is Walmart's service layer for mobile clients' configuration, CMS, a-b testing setup, and a few other sundry related services. It stitches together a constellation of data sources into a concise menu of API calls that mobile clients make to intialize and configure themselves.
Quimby is a REST service layer based upon the Gogo micro-service framework that we in turn built with Node.js, Hapi, Zookeeper, and Redis. Gogo is able to expose an array of web servers as a single host, and offers the ability to isolate tasks into smaller focused processes, emphasizing scalability and failure recovery. For example, a failure in any micro-service will not affect the life cycle of a request. Gogo also offers the additional features required to build distributed services with shared state, such as leader election.
- Penny (part of Gogo) - The micro-service router, responsible for pairing a request with a servicer
- Micro-Services (the core of Quimby) - A variety of web servers responsible for servicing requests via Penny
- Zookeeper (part of Gogo) - Micro-service registry, leader election transport
- Redis (part of Gogo) - Shared process state and cacheing
As the micro services spin up, in this case Hapi servers, they register themselves with Zookeeper, using our gogo-gadget library. These services may enter elections for leadership, which allows us to reduce load on upstream systems. For any shared state or cacheing required by the micro service processes, we utilize Redis. In situations where a leader is elected, the leader is writing to redis while the followers read. Finally, Penny subscribes to Zookeeper changes, providing it with updates as micro server processes come and go, so it can maintain an up-to-date routing table.
When a request arrives to penny for a particular endpoint, it leverages the routing table it's maintaining to find an appropriate micro-server capable of servicing the request. Penny opens the connection and streams it through to the requestor. If any errors occur while connecting to the micro-service, Penny is able to attempt connecting to alternate processes until it finds one able to service the request.
Our primary goals in persuing this architecture for Quimby were:
- Scalability
- Allow for fine-grained resource control
- Fault tolerance
By separating business logic into process boundaries, we have the ability to dramatically increase capacity in one high-demand service, without also adding capacity to other lesser used services. By running fleets of processes fronted by a micro-service router, we gained better control over, and improved recovery from process failures.