Created
October 29, 2015 14:23
-
-
Save rodrickbrown/7d639c6a5d01d6a566a5 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ASAPP DevOps and Systems Engineering Challenge | |
============================================== | |
v0.1 | |
Welcome to your challenge project! | |
You have two timeline options. If you live outside of NY and would have to fly in for your onsite, we strongly prefer that you take option 1. If coming in to the office is easy for you, then whichever you prefer is great. | |
Option 1: Code at home, half-day at ASAPP | |
- Finish your implementation and write up answers to the follow-up questions at home on your own time. | |
- Send us your results within 5 days. If you need more time, please let us know. | |
- We estimate that the challenge should take somewhere between 1.5 and 10 hours depending on your experience and speed. | |
Option 2: Answer questions at home, full-day at ASAPP | |
- Think through the challenge at home, make a rough plan of your approach, and write up answers to all the follow-up questions. | |
- Send us your results within 3 days. If you need more time, please let us know. | |
- We estimate this should take somewhere between 0.5 and 3 hours depending on your experience and speed. | |
Please bring your laptop if you come visit the ASAPP office - working in a familiar dev environment is always more efficient! | |
Motivation | |
---------- | |
ASAPP's server infrastructure must deliver realtime communication to companies with hundreds of millions of customers while maintaining virtually 0 downtime. We persist chat logs more or less forever, frequently deploy new code multiple times per day, and must meet multiple industry compliance standards in the coming year (HIPAA, PCI, etc). In order to tackle these challenges and many more, we rely on our team members' ability to reason about and design such systems, both conceptually and concretely, from a bird's eye perspective as well as in great detail. | |
Your challenge is to design and implement 0-downtime deployment for three simple HTTP servers (Go, Python & Lua), and then discuss a few questions around systems engineering, InnoDB schemas, downtime monitoring, and logging. | |
If we mutually agree to proceed then your work will form the basis for continued discussions and interviews. | |
Misc thoughts and recommendations before we go into details | |
----------------------------------------------------------- | |
- Use the tools and languages that you're most familiar with. | |
- We really value legible code and well organized file directories. | |
- Opt for using open source libraries rather than reinventing any wheels. | |
- It's a plus if your results come with a version control commit history. | |
- Have fun! If you don't think this project sounds like fun, then working at ASAPP may not be your cup of tea :) | |
Challenge equirements | |
--------------------- | |
1. Put together three basic HTTP servers - one in Go, one in Python and one in Lua. | |
- Keep them as simple as possible, e.g responding "Hi!" to every request is sufficient. | |
- Use any HTTP libraries you want, e.g http/net for Go, werkzeug for Python and xavante for Lua. | |
- To simulate long-lived connections, you may want to put a random `sleep` in there. | |
- This is basically just the setup for the challenge - write as little code as possible :) | |
2. Write deployment script(s) for the servers. | |
- It should be invoked something like `./deploy-servers go-server lua-server python-server` | |
- It should be able to deploy one or multiple servers | |
- It should be able to deploy the servers to multiple machines | |
- You can assume that there is a script that lists machine hostnames for a given server name, e.g `./get-hostnames go-server` | |
- You can also assume that all required server dependencies are installed on the target machines | |
- When deploying a new server version, open connections to the old server version must not be interrupted | |
- There must always be an unchanging way to reach the most recent version of any given server. | |
(e.g have :8000 always go to the newest go-server version, :8001 to lua-server, and :8002 to python-server) | |
Follow-up questions | |
------------------- | |
Please take the time to write answers to these questions. Think through them deeply, but keep answers short and comprehensive when possible. | |
Our goal is to get a sense how deep and broad your understand of systems like ours is, and how effectively you can communicate about them. Don't worry if you don't have all the answers off the top of your head: we're also very much looking for your ability to reason about these sorts of problems, and design/evaluate possible answers. | |
1. A realtime communication server tends to have lots of connections that can remain open for hours or even days. At some point after deploying a new server version, the old version has to terminate. How would you do this? Write up short descriptions of two or three methods, and indicate which you prefer and why. | |
2. InnoDB clusters data in its primary key B+Tree. As a result, "natural primary key" tables and "auto-increment primary key" tables have different characteristics. In a few words, how would you describe the differences? Also, give at least one examples of when you would use natural keys over auto-incrementing ones. | |
3. We strive for virtually 0 downtime, but shit does happen. If/when disaster strikes we have to respond immidately. Please take the time to describe how you would structure downtime alerts and human response protocols at a company like ASAPP in order to sleep soundly at night. | |
4. Logs can be fantastic, and logs can be a headache. We have servers written in multiple languages, with multiple versions of each running in parallel on multiple machines that can be torn down and spun up at any time. In addition, PCI compliance requires long-lived access logs of all production services. How would you structure logging for this whole system? | |
5. How do you feel about being point person and in-house expert on compliance? (PCI, HIPPA, etc.) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment