https://www.mongodb.com/compare/mongodb-mysql
- relational
- tables and rows, 1 row consists of columns
- referential integrity (check if foreign key mapping is valid, e.g. a Student table uses course_id as a foreign key mapping to Course table, somehow this Student writes a value of course_id that does not exist in id Column in Course table, we said this is invalid reference.)
- use SQL for data access.
- JOIN is required to retrieve data from multiple table.
- DB schema and data model (think of it as a design pattern) need to be defined ahead of time. This is a rigid approach to ensure data safety, but in exchange of flexibility (e.g. changing data type => schema migration occurs).
- non-relational.
- collections and documents. Data format is BSON - binary JSON.
- 1 document consists of a series of key/value pairs of various types, supporting array and nested document (denormalized data model).
- The structure of key/value pairs in 1 Document does not need to be the same as other Documents inside the same Collection. (e.g. 1 document can add 'like_anime: true' and other document can add 'like_son_tung_mtp: true')
- Therefore, Document are self-describing, Collection does not do much in telling user what is inside Document.
MySQL: its rigidness make it very suitable for banking transaction, airport flight, ...
MongoDB: its flexibility, horizontal scalability with the cost of safety features like referencial integrity make it very suitable for unstructured dynamic data creation in Social Networking Site, ...
- very flexibie in no enforced schema to collection
- supported denormalized data type like arrays and nested document
- Document format is BSON, friendly to JS developer in specific.
- store documents of varying schemas, including unstructured data. These data does not need to handle safety like structured data stored in MySQL RDBMS.
*E.g. a web application that does not depend on structured schemas, it can easily serve unstructured, semi-strucured or even structured data, all from the same Collection. *
MongoDB is not fully ACID-compliance. Not everything MongoDB does is ACID.
Single-document transaction is ACID by default, DB engineer does not need to write custom code to implement ACID.
Multi-document transaction are not ACID defaultly. But you can write custom code to implement ACID, since MongoDB has provided the Transactional API. You can horizontal scale in distributed environment with ACID complient.
CAUTION: IT IS NOT EASY TO WRITE CODE HANDLE ACID-COMPLIENCE TRANSACTION, HARDER WHEN IT IS DISTRIBUTED TO MULTIPLE NODE
- required to understand the principle of normalization, referential integrity and relational database design.
- best suitable for applications that require very complex but rigid data structure and DB schemas, applied to 1 out of numerous tables in that system
E.g. a banking application that requires very strong referential integrity and transactional guarentees, fast and exact point-in-time data integrity That is why when we transfer or withdraw money online through our bank app, it is very fast and accurate.
https://stackoverflow.com/questions/8729779/why-nosql-is-better-at-scaling-out-than-rdbms
- extremely easy to scale.
- easily configure a sharded cluster contains multiple nodes, each note holds responsibility for a portion of the database (a.k.a a shard, a partition).
- easily configure a replica set contains multiple nodes, usually 1 read-and-write Primary node and multiple read-only Secondary nodes, each node holds the same data.
- sharded cluster can also be configured as a replica set as well.
Sharded cluster pros:
- horizontal scale, add more nodes to hold more newly created data.
- increase read and write operation performance, since these operations are distributed as well.
Replica set pros:
- horizontal scale, add more nodes to copy more existed data. This enables high data availability, both in terms of disaster recovery and geographical availability (copy machine are not put near each other, they can be scattered everywhere in the world)
- increase read operation performance, since it can distribute read operation to many nodes at once.
- Primary node is responsible for handling write operation, all other Secondary nodes mirrors Primary.
Vertical scalability: add resource to existing server, but it has an upper limits, and it is more expensive than using multiple cheap server to horizontal scale.
Adding read replicas for Master-Slave Replication: adding read-only Slaves copies that mirroring a read-and-write Master, the number of Slaves is 5 at max.
=> Causing issues with applications that are either write-heavy or write-and-write regularly since Slaves tend to lag behind Master.
Master-Master replication: limited implementation, unlike MongoDB.
- NoSQL but supported JOINS with
$lookup
operation. But less needed since related data can be denormalized and nested/embedded each other, therefore keep most of the data in 1 Document. This could make the Document quite large in size, however. - optimized for write performance, prioritizing speed over transaction safety when making multi-document transaction.
- optimized for INSERT/UPDATE a large number of records.
- optimized for JOINS across multiple tables that have been appropriately indexed.
- unoptimized for write operation, data needs to be inserted rows by rows.
- optimized for SELECT a large number of records
MongoDB: THE WINNER!
- schemaless design of Documents => easy to build and enhance with application, application changes do not impact so much on database.
- multiple ways to update the schema of Documents inside a Collection (creating new fields using aggregation pipeline, update nested array field, ...)
- document structure maps naturally to modern OO language, the true power of JSON.
- removes the complex ORM layer that loves RDBMS.
-
Add overhead and unnecessary responsibility to developers, slow developers down: the objects they write in code has to adapt to relational structure, and this is not an easy task.Yes, they could use ORM, but there is a general rule of thumb that ORM will never good with optimization and performance since any abstraction layer between app and DB only comes down with more process, more storage and more time.
-
Not resilient to business requirement changes: even if mapping is done manually (not using ORM), app evolves from time to time and database schemas have to change in order to match new business requirements. Lucky? just add more tables and columns. Unlucky? Remapping, redesigning the schema.
-
App downs when evolving schema: lock the table until existing data is copied into the new schema. During the evolution process, application remains inactive. (I also called this schema migration).
-
Breaking helper components: business requirements changes => app developing => database changes => schema migration needed. Migrates schema can break trigger, materialized view, stored procedures, stored functions, ...
Enough to use for both => DRAW