Braveno Architecture

Design Considerations

When designing software that deals with money, special consideration must be given to correctness and security. It is imperative that we build deterministic systems which don't lose data. We do this for two reasons: regulatory requirements and we simply don't know how to value it in the future. That is why event sourcing will be core to Braveno's design. Event sourcing is a very effective design approach to building highly available, deterministic systems. This approach is common in traditional finance and unfortunately not so in the virtual currency space.

The fundamental idea of event sourcing is to persist the sequence of events that lead to the current state of the application. Since events are something that happened in the past (facts which cannot be changed), they can be cached, copied and distributed without worry. With event sourcing, you know exactly when, how and, most importantly, why data changed.

Don’t Lose Data

Don't lose data because we simply don't know how to value it in the future. We might be required by regulation to provide a verifiable audit trail or want to learn new insights from the data in the future. When all you store is the current state in a database, you lose the original intent and auditability of the action as soon as you modify the data in place. This is a lossy system; we don't want a lossy system.

Solve for problems, not features

Favour solving problems instead of developing features. By solving problems and doing it as simple as possible, you will converge on the "perfect" codebase. A "perfect" codebase is one in which each change solves a problem at minimal cost. This moves the conversation from "What features do my client's want?" to "What problems am I solving for my clients?". This is such a powerful concept.

Build deterministic systems

non-determinism = parallel processing + shared mutable state

Developing correct software is hard enough and it's almost impossible when you have global stared mutable state. Favour a functional style when programming and avoid shared mutable state at all costs. Use message passing of immutable values between processes in favour of shared mutable state. Writing code this way will eliminate a certain class of concurrency bugs like race conditions because they simply aren't possible anymore.

Non-determinism can also come from the build process— any stable release must have the dependencies pinned to a static version.

Don't break backwards compatibility

Once an API goes public, we will support that version indefinitely. With time, constraints and requirements will change and this will cause you to reevaluate your code. We will handle new API changes by deprecating the old and creating a new distinct API for the new.

Telemetry as a first class citizen

Without metrics or telemetry, how do we know if something is a problem if we can't quantity it? Answer: you can't, so we should instrument our processes with telemetry from the beginning.

Apply the Single Writer Principle

The single biggest hindrance on scalability in a highly available system (assuming datastructures and algorithms are good) is multiple writers contending on a shared resource (e.g. I/O), not to mention the code complexities that arise when managing it. To avoid this contention and associated queuing effects, all state should be owned by a single writer for mutation purposes.

You can read more about the Single Writer Principle here.

Prefer unshared state

Encapsulating state by keeping it private (local to the thread) and unshared is ideal. Even with the Single Writer Principle, updates need to happen atomically and in order. Keeping it private and local dramatically reduces complexity by the simple fact that it's not concurrent. In lieu of global shared state, you can communicate changes by message passing between threads. Since this leaves you with a system that has no contention, thus infinitely scalable.

Cloud Native Architecture

Design to run in the cloud from the beginning.

Murphy's Law

Anything that can go wrong, will go wrong.

Assume things will fail at the worst possible time and design for it. By forcing your applications developers to think about the worst case, they will build more robust systems that are much more resilient to failure and be less fragile. Software isn't about just the happy path.

Architecture Design Principles

Isolated execution environments
- think bulkheads and compartments in hull design. Any one compartment can be compromised but it still doesn’t take down the ship.
AWS is responsible for secure infrastructure and services. We are responsible for:
- AMI, Operating System
- Applications
- Data in transit
- Data at rest
- Data in store
- Credentials
- Policies and configuration

trevorbernard/Engineering.md