Skip to content

Instantly share code, notes, and snippets.

@soyart
Last active September 15, 2021 16:39
Show Gist options
  • Select an option

  • Save soyart/f76f6053447fd1891cddf1bdbc53be4d to your computer and use it in GitHub Desktop.

Select an option

Save soyart/f76f6053447fd1891cddf1bdbc53be4d to your computer and use it in GitHub Desktop.
Answers for technical questions

1. When visiting a website, the following happens respectively

  1. DNS lookup

  2. TCP connection Open

  3. TLS handshake

  4. HTML parsing

  5. JavaScript parsing

  6. API call (AJAX)

2. What is container technology and its benefits

Container technology, e.g. Docker, is a technology that provide OS-level virtualization in packages called containers. These containers can therefore eliminate cross-OS incompatibilities. Containers greatly reduce the development and deployment difficulties, allowing developers to develop in different host OS (e.g. macOS vs Windows vs Linux) BUT work in the same container environment with the production environment.

For example, if the production environment is Ubuntu docker image, then the developers who are working on either Windows or Mac machines can develop, run, and test their works with the same Ubuntu docker image environment.

Other than benefits in software development, Dockers when used properly can also maximize a computer resource. Instead of setting up VMs for different tasks individually which will leave wasted resource, Docker can pack many images on the same host. Now one computer can run numerous VMs, and those VMs in turn run many Docker images, which can save costs and time when configuring these systems (because Docker configuration can be automated and replicated easily).

3. What is caching and its benefits?

Caching is a technique used to store information on faster storage (e.g. CPU cache, or RAM vs the much slower disk storage) for better I/O performance, e.g. access time, or query performance.

There's a catch however. If we are using certain type of cache like Write-Behind (which is volatile - if we lose power, we lose data), then we risk losing the data since there is a time when data is not yet committed to non-volatile storage.

Also, faster memory like CPU cache and RAM always cost more than slower, non-volatile memory like SSDs and hard disks. This remains true even with today's cloud infrastructure, like AWS ElastiCache. This means that we must carefully determine an optimal level of cache, that is both high-performing while also not being too expensive.

One of the tools used for caching is Redis, which is an in-memory key-value database and cache that is usually used today in front of the actual database because it is very fast.

4. Write a function (in any language) that reverse element order of array of strings without using any built-in functions or libraries.

Click here for code

5. In which cases should we choose MongoDB over Postgres, and in which cases should we choose Postgres over MongoDB.

When to use MongoDB

We should choose MongoDB if the data we're dealing with need extra flexibility (i.e. unstructured data), or if it doesn't fit in a table, or if it is not very relational. This is because MongoDB is essentially a document database, and it doesn't resemble traditional SQL tables at all. In vanilla MongoDB specifications, there is not even a foreign key or linking mechanisms like relations.

In MongoDB, we can also store data without having to first define a schema, which means that our data model stored in such database can change at any time (which is impossible in SQL). And because it is not a table database, we can easily store unstructured data on these databases, at the cost of not being to do stuff like JOIN in a traditional SQL database that enables very complex queries.

In terms of performance, when compared to SQL databases, MongoDB is generally faster when writing new data because there is no structures that may slow down write operations, but MongoDB is generally slower when it comes to queries (reading) because there is no strictly defined structures that the computers can go through quickly when doing queries. So if the use case is write-heavy, MongoDB maybe a better choice.

MongoDB also handles multiple connections better than traditional SQL. Moreover, MongoDB can be scaled to multiple servers more easily because we don't have to sync our tables across multiple computers. According to my own understanding, is less expensive to run and maintain. Also, many people pair Redis with MongoDB to improve read performance via caching.

When to use Postgres

We should choose Postgres if the data is highly relational and tabular, or if the use case is very read-heavy, or if it is legacy data inherited from legacy RDBMS systems. Postgres also has some extra features like JSON data store in its database, which is similar to how we make queries to MongoDB, so we can have the best of both worlds when we mainly require a SQL database while also wanting to store document-like (JSON) data.

And because Postgres has been around for a long time, we may be forced to use Postgres or any other SQL databases when we have legacy code that can't talk to MongoDB.

6. GraphQL vs RESTful API

GraphQL is a data query language for APIs, developed and released as open-source project by Facebook. It allows clients to define the structure of the requested data (client-driven), allowing more flexibility and other potential benefits when contrasted to RESTful APIs.

REST servers expose multiple endpoints (URL-driven) for clients to make requests to, while GraphQL only exposes just one URL for POST requests, where the body of the POST requests is query (query-driven). Both GraphQL and REST usually exchanges JSON data, although other data other than JSON (e.g. a file) can also be used.

When to use GraphQL

If the use case is to build an ad-hoc public API that we don't know how it will be used, or it will be used differently on different consumer, then we should use GraphQL because the clients won't have to process all the extra information they will not use.

When to use REST

If we are building a very specific API, like stock prices, then it may be better to use REST. If the APIs only have one single consumer (e.g. our own web page), then it may also be better to use REST.

7. How database indexing affects query and insertion performance

Database indexing may increase query (read) performance in high-traffic situations because we can look for data from a sorted list instead of going through every database row, but it may decrease insert (write) performance, because the index needs to be recomputed after each insert. The trade-offs have to be taken into consideration when deciding about database indexing.

8. How to store passwords in databases

To securely store password in a database, we would need to first hash the plaintext passwords before we store them. But just using secure hasing algorithms like SHA is not enough, because the attackers may have prepared a rainbow table for all possible password combinations. We must combine our own randomness with the plaintext passwords before we hash it.

To protect the hashed passwords against rainbow table attacks, we introduce our one-time random initialization vector (salt, or entropy), which would produce different password hash every time even though the input passwords are identical. Bcrypt and PBKDF2 are examples of passwords hash algorithms which make use of the salt.

9. Default values in Go for different data types

  • int64: 0

  • string: Empty string

  • bool: false

  • *bool: nil

  • struct { a bool, b int64, c *string }: { false, 0, nil }

  • interface{}: nil

  • []string: []

10. Write a program that encrypts and decrypts string with RSA-OAEP and SHA-256. Use pub/pri key pair from environment variable RSA_PUB_KEY and RSA_PRIV_KEY. The program will have functions encrypt(plaintext) ciphertext and decrypt(ciphertext) plaintext

Click here for code

If the message size is very long, how do we encrypt the message with asymetric key encryption?

The nature of RSA key also includes the fact that a key pair can only encrypt a message of limited length. If we really want to use asymmetric encryption (due to any key distribution concerns, or any other concerns), we can mix symmetric encryption (e.g. AES) to our scheme.

  1. We first use a symmetric key, for example s_key, to first encrypt the large plaintext symmetrically.

  2. Then we can asymmetrically encrypt s_key with the receiver's public key.

  3. Send our symmetrically encrypted ciphertext and asymmetrically encrypted s_key to our receivers.

  4. When the receivers receive the data (ciphertext and encrypted s_key), they can use their private key to asymmetrically decrypt the asymmetrically encrypted s_key.

  5. The receiver can then use the asymmetrically decrypted s_key to symetrically decrypt the whole message.

For AES algorithms, I recommend AES256-GCM (which has message authentication), or AES256-CTR for very, very large message.

GCM must have the entire data in memory during encryption, so it may not be suitable for encrypting/decrypting very large files in devices with limited memory, while CTR sequentially encrypts/decrypts the message in chunks, ruling out the memory consumption concerns. However, CTR does not have message authentication. We must carefully choose the trade-offs between these AES two algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment