You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.
This classification has seven major elements. They are: platform and management, education and reference, home and entertainment, content and communication, operations and professional, product manufacturing and service delivery, and line of business.
Platform and management —Desktop and network infrastructure and management software that allows users to control the computer operating environment, hardware components and peripherals and infrastructure services and security.[4]
Education and reference —Educational software that does not contain resources, such as training or help files for a specific application.[4]
Home and entertainment —Applications designed primarily for use in or for the home, or for entertainment.[4]
Content and communications —Common applications for productivity, content creation, and communications. These typically include office productivity suites, multimedia players, file viewers, Web browsers, and collaboration tools.[4]
Operations and professional —Applications designed for business uses such as enterprise resource management, customer relations management, supply chain and manufacturing tasks, application development, information management and access, and tasks performed by both business and technical equipment.[4]
Product manufacturing and service delivery —Help users create products or deliver services in specific industries. Categories in this section are used by the North American Industry Classification System (NAICS).
Mermaid is a Markdown-inspired tool that renders text into diagrams. For example, Mermaid can render flow charts, sequence diagrams, pie charts and more. For more information, see the Mermaid documentation.
To create a Mermaid diagram, add Mermaid syntax inside a fenced code block with the mermaid language identifier. For more information about creating code blocks, see "Creating and highlighting code blocks."
graph TD
A[Christmas] -->|Get money| B(Go shopping)
B --> C{Let me think}
C -->|One| D[Laptop]
C -->|Two| E[iPhone]
C -->|Three| F[fa:fa-car Car]
Loading
Sequence diagram
sequenceDiagram
Alice->>+John: Hello John, how are you?
Alice->>+John: John, can you hear me?
John-->>-Alice: Hi Alice, I can hear you!
John-->>-Alice: I feel great!
Loading
class diagram
classDiagram
Animal <|-- Duck
Animal <|-- Fish
Animal <|-- Zebra
Animal : +int age
Animal : +String gender
Animal: +isMammal()
Animal: +mate()
class Duck{
+String beakColor
+swim()
+quack()
}
class Fish{
-int sizeInFeet
-canEat()
}
class Zebra{
+bool is_wild
+run()
}
The C4 model is an "abstraction-first" approach to diagramming software architecture, based upon abstractions that reflect how software architects and developers think about and build software. The small set of abstractions and diagram types makes the C4 model easy to learn and use.
diagrams at this level of detail, especially when you can obtain them on demand from most IDEs.
Level 1, a system context diagram, shows the software system you are building and how it fits into the world in terms of the people who use it and the other software systems it interacts with. Here is an example of a system context diagram that describes an Internet banking system that you may be building:
Personal customers of the bank use the Internet banking system to view information about their bank accounts and to make payments. The Internet banking system uses the bank's existing mainframe banking system to do this, and uses the bank's existing e-mail system to send e-mail to customers. Colour coding in the diagram indicates which software systems already exist (the grey boxes) and those to be built (blue).
Level 2: Container diagram
Level 2, a container diagram, zooms into the software system, and shows the containers (applications, data stores, microservices, etc.) that make up that software system. Technology decisions are also a key part of this diagram. Below is a sample container diagram for the Internet banking system. It shows that the Internet banking system (the dashed box) is made up of five containers: a server-side web application, a client-side single-page application, a mobile app, a server-side API application, and a database.
The web application is a Java/Spring MVC web application that simply serves static content (HTML, CSS, and JavaScript), including the content that makes up the single-page application. The single-page application is an Angular application that runs in the customer's web browser, providing all of the Internet banking features. Alternatively, customers can use the cross-platform Xamarin mobile app to access a subset of the Internet banking functionality. Both the single-page application and mobile app use a JSON/HTTPS API, which another Java/Spring MVC application running on the server side provides. The API application gets user information from the database (a relational-database schema). The API application also communicates with the existing mainframe banking system, using a proprietary XML/HTTPS interface, to get information about bank accounts or make transactions. The API application also uses the existing e-mail system if it needs to send e-mail to customers.
Level 3: Component diagram
Level 3, a component diagram, zooms into an individual container to show the components inside it. These components should map to real abstractions (e.g., a grouping of code) in your codebase. Here is a sample component diagram for the fictional Internet banking system that shows some (rather than all) of the components within the API application.
Two Spring MVC Rest Controllers provide access points for the JSON/HTTPS API, with each controller subsequently using other components to access data from the database and mainframe banking system.
Level 4: Code
Finally, if you really want or need to, you can zoom into an individual component to show how that component is implemented. This is a sample (and partial) UML class diagram for the fictional Internet banking system that, showing the code elements (interfaces and classes) that make up the MainframeBankingSystemFacade component.
It shows that the component is made up of a number of classes, with the implementation details directly reflecting the code. I wouldn't necessarily recommend creating
Structurizr is a web-based rendering tool designed to help software development teams create software architecture diagrams and documentation. It can render diagrams that are interactive, animatable, and embeddable. Structurizr can also publish Markdown/AsciiDoc documentation and architecture decision records (ADRs). Structurizr is available in a number of versions.
In Structurizr, each ADR has an ID, title, date and status (e.g. Proposed, Accepted, Superseded, etc), along with unstructured content written using Markdown or AsciiDoc. ADRs can either be created manually, or imported from tools like adr-tools.
In addition to the usual Markdown/AsciiDoc syntax for including images, you can embed live versions of the C4 model diagrams from your workspace into your documentation.
views {
systemContext financialRiskSystem "Context" "An example System Context diagram for the Financial Risk System architecture kata." {
include *
autoLayout
}
You can now click through the decisions, and press the Space key to open the quick navigation feature. Click the little graph button underneath the heading, and the visualisation will open.
Salt is a subproject included in PlantUML that may help you to design graphical interface or Website Wireframe or Page Schematic or Screen Blueprint.
The goal of this tool is to discuss about simple and sample windows.
You can use the archimate keyword to define an element. Stereotype can optionally specify an additional icon. Some colors (Business, Application, Motivation, Strategy, Technology, Physical, Implementation) are also available.
What to Measure: Using SLIs
Once you agree that 100% is the wrong number, how do you determine the right number? And what are you measuring, anyway? Here, service level indicators come into play: an SLI is an indicator of the level of service that you are providing.
While many numbers can function as an SLI, we generally recommend treating the SLI as the ratio of two numbers: the number of good events divided by the total number of events. For example:
Number of successful HTTP requests / total HTTP requests (success rate)
Number of gRPC calls that completed successfully in < 100 ms / total gRPC requests
Number of search results that used the entire corpus / total number of search results, including those that degraded gracefully
Number of “stock check count” requests from product searches that used stock data fresher than 10 minutes / total number of stock check requests
Number of “good user minutes” according to some extended list of criteria for that metric / total number of user minutes
Types of components
The easiest way to get started with setting SLIs is to abstract your system into a few common types of components. You can then use our list of suggested SLIs for each component to choose the ones most relevant to your service:
Request-driven
The user creates some type of event and expects a response. For example, this could be an HTTP service where the user interacts with a browser or an API for a mobile application.
Pipeline
A system that takes records as input, mutates them, and places the output somewhere else. This might be a simple process that runs on a single instance in real time, or a multistage batch process that takes many hours.
Simple laws for building cost-aware, sustainable, and modern architectures.
LAW I. Make Cost a Non-functional Requirement
When designing, developing, and operating systems, consider cost implications early and continuously in order to balance features, time-to-market, and efficiency.
LAW II. Systems that Last Align Cost to Business
Architect systems that align with the business model's profit levers to achieve economies of scale as revenue permits. Unrestrained growth without profitability erodes value.
LAW III. Architecting is a Series of Trade-offs
Every design decision comes with trade-offs. It's crucial to regularly re-evaluate technical and business trade-offs, and invest in resources aligned to business needs.
LAW IV. Unobserved Systems Lead to Unknown Costs
Though monitoring systems require upfront investment, they enable organizations to pinpoint wasteful practices, streamline workflows, and strategically allocate resources to priorities.
LAW V. Cost Aware Architectures Implement Cost Controls
With robust monitoring in place, you can take action in areas where you have identified opportunities for improvement. By implementing granular controls, you can optimize for both cost and user experience.
LAW VI. Cost Optimization is Incremental
The pursuit of cost efficiency is an ongoing journey. Monitor your systems to understand patterns and trim inefficiencies. Continual optimization requires revisiting systems to find further improvements.
LAW VII. Unchallenged Success Leads to Assumptions
Continuously question what's worked in the past. Revisit methods and tools despite previous successes. As Grace Hopper famously stated, one of the most dangerous phrases in English is: "we've always done it this way".
Scalability refers to the systems' ability to perform and operate as the number of users or requests increases. It is achievable with horizontal or vertical scaling of the machine or attaching AutoScalingGroup capabilities. Here are three areas to consider when architecting scalability into your system:
Traffic pattern: Understand the system's traffic pattern. It's not cost-efficient to spawn as many machines as possible due to underutilization. Here are three sample patterns:
Diurnal: Traffic increases in the morning and decreases in the evening for a particular region.
Global/regional: Heavy usage of the application in a particular region.
Thundering herd: Many users request resources, but only a few machines are available to serve the burst of traffic. This could occur during peak times or in densely populated areas.
Elasticity: This relates to the ability to quickly spawn a few machines to handle the burst of traffic and gracefully shrink when the demand is reduced.
Latency: This is the system's ability to serve a request as quickly as possible. This also includes optimizing the algorithms and using edge computing to replicate the system near users to reduce the round-trip time of a request.
Availability
Availability is measured as a percentage of uptime and defines the proportion of time that a system is functional and working. Availability is affected by system errors, infrastructure problems, malicious attacks, and system load. Things to consider include:
Deployment stamps: Deploy multiple independent copies of application components, including data stores.
Geodes: Deploy backend services into a set of geographical nodes, each of which can service any client request in any region.
Extensibility
Extensibility measures the ability to extend a system and the effort required to implement the extension. The extension can occur by adding new functionality or modifying existing functionality. The principle provides enhancements without impairing current system functions. When architecting extensibility, consider:
Modularity and reusability: Reusability, together with extensibility, allows technology to be transferred to another project with less development and maintenance time, as well as enhanced reliability and consistency.
Pluggability: This is the ability to easily plug in other components, for example with microkernel architecture.
Consistency
Consistency guarantees that every read returns the most recent write. This means that after an operation executes, the data is consistent across all the nodes, and thus all clients see the same data at the same time, no matter which node they connect to. Consistency improves the data's freshness.
Resiliency
A system can gracefully handle and recover from accidental and malicious failures. Detecting failures and recovering quickly and efficiently is necessary to maintain resiliency. The primary factor to consider when architecting for resiliency is:
Recoverability: This is the preparatory processes and functionality that enable services to return to an initial functioning state after an unintended change. Unintended changes include soft or hard deletion or misconfiguration of applications.
Disaster recovery: Disaster recovery (DR) consists of best practices designed to prevent or minimize data loss and business disruption resulting from catastrophic events—everything from equipment failures and localized power outages to cyberattacks, civil emergencies, criminal or military attacks, and natural disasters.
Following are some DR design patterns you might implement to build resiliency into your architecture:
Bulkhead: This pattern isolates elements of an application into pools so that if one fails, the others will continue to function.
Circuit breaker: This pattern handles faults that might take a variable amount of time to fix when connecting to a remote service or resource.
Leader election: This pattern coordinates the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances.
Usability
Usability is a system's capacity to enable users to perform tasks safely, effectively, and efficiently while enjoying the experience. It is the degree to which specified consumers can use software to achieve quantified objectives with effectiveness, efficiency, and satisfaction in a quantified context of use. Related factors include:
Accessibility: Make the software available to people with the broadest range of characteristics and capabilities, including users with deafness, blindness, colorblindness, and more.
Learnability: Make the software easy for users to learn.
API contract: Internal teams need to understand the API contracts to help them plug into any system.
Observability
Observability is the ability to collect data about program execution, modules' internal states, and communication between components. To improve observability, use various logging and tracing techniques and tools, including the following:
Logging: There are different types of logs generated within each request, such as event logs, transaction logs, message logs, and server logs.
Alerts and monitoring: Prepare monitoring dashboards, create service-level indicators (SLIs), and set up critical alerts.
Tiered levels of support: Set up on-call support processes for Level 1 and Level 2 support. L1 support includes interacting with customers. L2 support manages the tickets escalated by L1 and helps troubleshoot. L3 is the last line of support and usually comprises a development team that addresses the technical issues.
Security
Security is the degree to which the software protects information and data so that people, other products, or systems have data access appropriate to their types and levels of authorization. This family of characteristics includes the following five attributes:
Confidentiality: Data is accessible only to those authorized to have access.
Integrity: The software prevents unauthorized access to or modification of software or information.
Nonrepudiation: Prove whether actions or events have taken place.
Accountability: Trace user actions.
Authenticity: Prove the user's identity.
Additional security requirements include:
Auditability: Audit trails track system activity so that when a security breach occurs, you can determine the mechanism and extent of the breach. Storing audit trails remotely, where they can only be appended, can keep intruders from covering their tracks.
Legality: This involves adherence to laws or other industry requirements.
Compliance: Adherence to data protection laws like GDPR, CCPA, SOC2, PIPL, or FedRamp
Privacy: Ability to hide transactions from internal company employees, such as encrypting transactions so that even database administrators and network architects cannot see them
Authentication: Security requirements ensure users are who they say they are.
Authorization: Security requirements ensure users can access only certain functions within the application (by use case, subsystem, web page, business rule, field level, and so forth).
Durability
Durability relates to software's serviceability and ability to meet users' needs for a relatively long time. Things to consider include:
Replication: Share information to ensure consistency between redundant resources to improve reliability, fault-tolerance, or accessibility.
Fault tolerance: This enables a system to continue operating correctly in the event of one or more faults within some of its components.
Archivability: This manages whether the data needs to be archived or deleted after a period of time. For example, customer accounts will be deleted after three months or marked as obsolete and archived in a secondary database for future access.
Agility
Agile is a software method that enables a team to respond to changes quickly. Software development is all about modification, so agility is a key NFR. Key factors include:
Maintainability: How easy is it to apply changes and enhance the system? Maintainability represents the degree to which developers can effectively and efficiently modify the software to improve, correct, or adapt it to changes in the environment and requirements.
Testability: How easily can developers and others test the software?
Ease of development: Can developers modify the software without introducing defects or degrading existing product quality?
Deployability: This is the time it takes to get code into production.
Installability: How easy is system installation on all necessary platforms?
Upgradeability: How quick and easy is it to upgrade from a previous version of an application or solution to a newer version on servers and clients?
Portability: Does the system need to run on more than one platform?
Configurability: How easily can end users change aspects of the software's configuration (through usable interfaces)?
Compatibility: How well can a product, system, or component exchange information with other products, designs, or members and perform its required functions while sharing the same hardware or software environment?
20190321 DEV308.Twelve-Factor (12 Factor) App Methodology and Modern Applications, EN, [video], ★★★
Typically, an application is composed of multiple components, with each component supporting UI, business logic, and database functions. The core principles that need to be followed when designing microservices-based codebase are single responsibility, high cohesion, and loose coupling. Each service has a single purpose and includes all the functions to carry out that single purpose.
Modern distributed applications have needs around lifecycle, networking, binding, and state management that cloud-native platforms must provide.
Kubernetes has great support around lifecycle management but relies on other platforms using the sidecar and operator concepts to satisfy the networking, binding, and state management primitives.
Future distributed systems on Kubernetes will be composed of multiple runtimes where the business logic forms the core of the application, and sidecar “mecha” components offer powerful out-of-the-box distributed primitives.
This decoupled mecha architecture offers the benefits of cohesive units of business logic and improves day-2 operations, such as patching, upgrades, and long-term maintainability.
20200714 Gartner’s Advice on How to Choose an Event Broker | Solace, EN, ★★★★★
There are three basic types of event brokers:
queue-oriented (like Solace PubSub+, RabbitMQ, Azure Service Bus, etc.),
Gartner does an adequate job describing the basic principles of queue-based brokers like RabbitMQ, ActiveMQ, Solace PubSub+ and others in that the pub-sub mechanism is typically based on creating queues for each consumer (or shared consumer group) and a routing mechanism to deliver published message to the appropriate queues.
log-oriented (like Apache Kafka or Amazon Kinesis), and
Gartner describes log-oriented brokers as based on the concept of an append-only logs of messages. Neither consumers nor the broker will remove messages when processed. Instead the log is retained and messages are purged as they age or as the log reaches a pre-determined size limit. This allows for what is called “message replay”.
subscription-oriented (such as Amazon EventBridge and Azure Event Grid).
Subscription-based brokers were born out of the need to support cloud-native function platform as a service and serverless architectures.
These design patterns are all about class instantiation. This pattern can be further divided into class-creation patterns and object-creational patterns. While class-creation patterns use inheritance effectively in the instantiation process, object-creation patterns use delegation effectively to get the job done.
These design patterns are all about Class and Object composition. Structural class-creation patterns use inheritance to compose interfaces. Structural object-patterns define ways to compose objects to obtain new functionality.
These design patterns are all about Class's objects communication. Behavioral patterns are those patterns that are most specifically concerned with communication between objects.
An object that acts as a Gateway (466) to a database table. One instance handles all the rows in the table.
discusses the Data Access Object pattern, which is a Table Data Gateway. They show returning a collection of Data Transfer Objects (401) on the query methods. It’s not clear whether they see this pattern as always being table based; the intent and discussion seems to imply either Table Data Gateway or Row Data Gateway (152).
I’ve used a different name, partly because I see this pattern as a particular usage of the more general Gateway (466) concept and I want the pattern name to reflect that. Also, the term Data Access Object and its abbreviation DAO has its own particular meaning within the Microsoft world.
PO ((bean, entity, Persistent object, etc.): which forms a one-to-one mapping relationship with the data structure of the persistence layer (usually a relational database).
DO (Domain Object): A domain object is a tangible or intangible business entity abstracted from the real world.
DTO (Data Transfer Object): is a software application system for transferring data between design patterns.
VO (View Object): A view object used to display layers. Its purpose is to encapsulate all the data of a specified page (or component).
On the Amazon Web Services (AWS) Cloud, you can use AWS Secrets Manager to rotate, manage, and retrieve database credentials throughout their lifecycle. Users and applications retrieve secrets with a call to the Secrets Manager API, removing the need to hardcode sensitive information in plaintext.
If you’re using containers for microservice workloads, you can securely store credentials in AWS Secrets Manager. To separate out configuration from code, these credentials are commonly injected into the container. However, it's important to rotate your credentials periodically and automatically. It’s also important to support the ability to refresh credentials after revocation. At the same time, applications require the ability to rotate credentials while reducing any potential downstream availability impact.
This pattern describes how to rotate your secrets that are secured with AWS Secrets Manager within your containers without requiring your containers to restart. In addition, this pattern reduces the number of credential lookups to Secrets Manager by using the Secrets Manager client-side caching component. When you use the client-side caching component to refresh the credentials within the application, the container doesn't need to be restarted to fetch a rotated credential.
This approach works for Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon Elastic Container Service (Amazon ECS).
Two scenarios are covered. In the single-user scenario, the database credential is refreshed on secret rotation by detecting the expired credential. The credential cache is instructed to refresh the secret, and then the application re-establishes the database connection. The client-side caching component caches the credential within the application and helps avoid reaching out to Secrets Manager for each credential lookup. The credential is rotated within the application without the need to force the credential refresh by restarting the container.
---
title: Order example
---
erDiagram
CUSTOMER {
int customer_id PK "IDENTITY"
string name
string sector
}
ORDER {
int order_id PK "IDENTITY"
int customer_id FK "refer to CUSTOMER.customer_id"
string deliveryAddress
}
LINE-ITEM {
int product_id PK "IDENTITY"
int order_id FK "refer to ORDER.order_id"
int quantity
float pricePerUnit
}
CUSTOMER ||--o{ ORDER : places
ORDER ||--|{ LINE-ITEM : contains
Loading
TO-BE
Add one nullable UUID (PK) column to the existing tables which will be used as the PK.
Update the values of UUID (PK) column based on the existing PK column.
(Optional) Add one or more nullable UUID (FK) columns to the existing tables which will be used to refer to the PK in other tables.
(Optional) Update the values of UUID (FK) columns based on the existing FK columns.
---
title: Order example
---
erDiagram
CUSTOMER {
int customer_id PK "IDENTITY"
string customer_uuid UK "# based on customer_id"
string name
string sector
}
ORDER {
int order_id PK "IDENTITY"
string order_uuid UK "# based on order_id"
int customer_id FK "refer to CUSTOMER.customer_id"
string customer_uuid "# based on customer_id"
string deliveryAddress
}
LINE-ITEM {
int product_id PK "IDENTITY"
string product_uuid UK "# based on product_id"
int order_id FK "refer to ORDER.order_id"
string order_uuid "# based on order_id"
int quantity
float pricePerUnit
}
CUSTOMER ||--o{ ORDER : places
ORDER ||--|{ LINE-ITEM : contains
Delete the existing PRIMARY KEY constraint and then re-create it with the new definition. See Modify Primary Keys for more details.
Recreate the FK constrains on the UUID (FK) columns which refre to the new UUID (PK) in other tab les.
---
title: Order example
---
erDiagram
CUSTOMER {
int customer_id UK "IDENTITY"
string customer_uuid PK "# based on customer_id"
string name
string sector
}
ORDER {
int order_id UK "IDENTITY"
string order_uuid PK "# based on order_id"
int customer_id "refer to CUSTOMER.customer_id"
string customer_uuid FK "# refer to CUSTOMER.customer_uuid"
string deliveryAddress
}
LINE-ITEM {
int product_id UK "IDENTITY"
string product_uuid PK "# based on product_id"
int order_id "refer to ORDER.order_id"
string order_uuid FK "# refer to ORDER.customer_uuid"
int quantity
float pricePerUnit
}
CUSTOMER ||--o{ ORDER : places
ORDER ||--|{ LINE-ITEM : contains
AS-IS
TO-BE
Add one nullable UUID (PK) column to the existing tables which will be used as the PK.
Update the values of UUID (PK) column based on the existing PK column.
(Optional) Add one or more nullable UUID (FK) columns to the existing tables which will be used to refer to the PK in other tables.
(Optional) Update the values of UUID (FK) columns based on the existing FK columns.
Drop the FK constraints of the existing tables. See Delete foreign key relationships for more details.
Delete the existing PRIMARY KEY constraint and then re-create it with the new definition. See Modify Primary Keys for more details.
Recreate the FK constrains on the UUID (FK) columns which refre to the new UUID (PK) in other tab les.