- why
- no much traffic
- database/application-server/fron-end all deployed in one box
- low budget
- easy to maintain
- if this is a commercial software system very bad
- single point of failure
- if server fail everything goes down
- separate out the database(bit better) - 2x servers yet fewer servers to maintain
- but still these are single point of failure / resiliency standpoint not help
- vertical scaling - throw more hardware power/ ram / cpu / disk capacity etc... still there's a limitation
- but again not something should do in production
- instead of single server more servers run application and on top of it a load balancer which distribute the load across those instances.
- since loadbalancer is aware of the offline servers it route the traffic to live ones - no downtime from customer perspective
- can further think of where those servers are deployed geographically which zone etc. to optimize the latency
- horizontal scaling works well with stateless services(subsequent request doesn't depend on previous request)
- Cold standby - DB just say it takes periodic backup and if the live DB goes down it can temporarily serve till we fix the live DB, last minute data may be lost due to backup frequency. it's cheap - not the one interviewer looking for an interview
- Warm standby - DB is just replicated.
- Hot standby - DB instead of replicate, application just writing the data to all the DB instances simultaniusly
-
pros
- sharding the DB(partitioning the data across DB servers)
- each shard has it's own backup DB
- write to which DB is depend on the sharding function(hashing func)
- since data is partioned across shard combining data can be challenging - hence when designing data make it simple avoid heavy joins, just simple K,V lookup is better
-
cons
- though to do joins across shards.
- resharding
- hotspots - celebrity problem
- most NoSQL db's do support most SQL operation and use SQL as their API
- still works best with simple key/value lookup - scalable or fake join by second k/v lookup query etc...
- a formal schema may not be needed.
- ex: MongoDB, DynamoDB, Cassandra, HBase
- concept is just throw unstructured(csv,text,json) data into big storage bucket like amazon s3
- common application for big data - unstructured data
- it's redundant, cz it stores multiple copies in different regions(probably)
- a cloud based feature can provide a way to query this data (build some kind of a schema out of these unstructured data - the application it self probably add a caching layer to improve it performance etc...)
- Amazon Athena(serveless)
- Amazon RedShift(distributed data warehouse)
- still think of partitioning this data to improve query performance(ex: query by date so partition by date)
- ex: folder structure
- year
- month
- date
- month
- year
- ex: folder structure
- can only provide 2/3
- Availability - single point of failure (ex: mongo has a primary router sitting in-front of all other instances)
- Partition-tolerance - can horizontally scale easily
- Consistency - if I write the data it should be immediately readable
(keep in memory copy of the data which a frequently needed) application demand more Reads than Writes
- you may have SLAs to make a particular API response under x ms (high performance / high available), cz hitting disk is expensive