Skip to content

Instantly share code, notes, and snippets.

@roshane
Last active June 8, 2022 12:04
Show Gist options
  • Save roshane/b13d6191a1495deb201f10ec0e1cc58e to your computer and use it in GitHub Desktop.
Save roshane/b13d6191a1495deb201f10ec0e1cc58e to your computer and use it in GitHub Desktop.
system design interview short notes

horizontal vs vertical scalign

single server design

  • why
    • no much traffic
    • database/application-server/fron-end all deployed in one box
    • low budget
    • easy to maintain
  • if this is a commercial software system very bad
    • single point of failure
    • if server fail everything goes down

how to make this better

  • separate out the database(bit better) - 2x servers yet fewer servers to maintain
  • but still these are single point of failure / resiliency standpoint not help
  • vertical scaling - throw more hardware power/ ram / cpu / disk capacity etc... still there's a limitation
  • but again not something should do in production

horizontal scaling

  • instead of single server more servers run application and on top of it a load balancer which distribute the load across those instances.
  • since loadbalancer is aware of the offline servers it route the traffic to live ones - no downtime from customer perspective
  • can further think of where those servers are deployed geographically which zone etc. to optimize the latency
  • horizontal scaling works well with stateless services(subsequent request doesn't depend on previous request)

DB failover strategies

  • Cold standby - DB just say it takes periodic backup and if the live DB goes down it can temporarily serve till we fix the live DB, last minute data may be lost due to backup frequency. it's cheap - not the one interviewer looking for an interview
  • Warm standby - DB is just replicated.
  • Hot standby - DB instead of replicate, application just writing the data to all the DB instances simultaniusly

DB horizontal scaling - typically NoSQL databases

  • pros

    • sharding the DB(partitioning the data across DB servers)
    • each shard has it's own backup DB
    • write to which DB is depend on the sharding function(hashing func)
    • since data is partioned across shard combining data can be challenging - hence when designing data make it simple avoid heavy joins, just simple K,V lookup is better
  • cons

    • though to do joins across shards.
    • resharding
    • hotspots - celebrity problem
    • most NoSQL db's do support most SQL operation and use SQL as their API
    • still works best with simple key/value lookup - scalable or fake join by second k/v lookup query etc...
    • a formal schema may not be needed.
    • ex: MongoDB, DynamoDB, Cassandra, HBase

Data lakes

  • concept is just throw unstructured(csv,text,json) data into big storage bucket like amazon s3
  • common application for big data - unstructured data
  • it's redundant, cz it stores multiple copies in different regions(probably)
  • a cloud based feature can provide a way to query this data (build some kind of a schema out of these unstructured data - the application it self probably add a caching layer to improve it performance etc...)
    • Amazon Athena(serveless)
    • Amazon RedShift(distributed data warehouse)
  • still think of partitioning this data to improve query performance(ex: query by date so partition by date)
    • ex: folder structure
      • year
        • month
          • date
  • image

ACID compliance

image

CAP Theorem

  • can only provide 2/3
    • Availability - single point of failure (ex: mongo has a primary router sitting in-front of all other instances)
    • Partition-tolerance - can horizontally scale easily
    • Consistency - if I write the data it should be immediately readable

image

image

image

Caching

(keep in memory copy of the data which a frequently needed) application demand more Reads than Writes

image

  • you may have SLAs to make a particular API response under x ms (high performance / high available), cz hitting disk is expensive

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment