https://viblo.asia/p/oltp-va-olap-co-gi-khac-nhau-maGK786BZj2#_mot-vi-du-oltp-thuc-te-3
https://dba.stackexchange.com/questions/4622/when-should-you-denormalize
T in OLTP stands for Transactional, meaning process/operation that made a small change inside a large database system.
E.g.
- An ATM that withdraws your money, simply subtract your balance, substract the cash in ATM, and you receive cash.
- Texting to other people: add text message to your data model and text message to their data model.
- A comment submission: add comment to the blog post, user's dashboard and user's notification can reference (anchoring) to those comments.
Now, these transaction are easily to messup in many ways, especially when OLAP is characterized by a high volume of small transactions.
In order to not messing them up, we have to set constraints when making transaction, these constraints are called: ACID
- A: atomicity: meaning a transaction, that requires multiple operation (transaction is not the lowest unit, operation is) to be executed fully (not a single operation failed or omitted).
- C: consistency: meaning state before transaction and after transaction must resolve to a common constraint. E.g. ATM withdrawal operation make sure that the sum of you_had_money + balance_had_money before and after are equal.
- I: isolation: meaning multiple transaction that modify/change the same data concurrently must resolve to the same state if doing it sequentially.
- D: durability: meaning the transaction must be successful even if system failure occurs while making that transaction.
OLTP often:
- write mostly, small and fast.
- massive number of user at a single point of time.
- real-time, 24/7 active.
- works with small set of up-to-date data.
- use normalized data model, provided by SQL-based databases like RDBMS, results in hundreds of small table in exchange for low cost of storage, and easy ACID-compliant.
- use cases: ERP, CRM, multi-media, news, web blogs, ...
Note: Not all database supports all these constraint, some database make you write custom transactional API, requires knowledge in OS in general and that DB in specifc.
A in OLAP means Analytical, it is for data analyzing.
The ultimate objective of OLAP is to research the data, look the data in a various point of views, extracts something from those views, finally make some decisions
These point of views tend to be ad-hoc (can not be pre-anticipated, on-demand requests), therefore they are very expensive.
OLAP often:
- read mostly, but very expensive, tend to be big and slow, but not too slow. Query engine required.
- the number of changes is very little, but the size is very large, usually are done anually or monthly in batch query.
- data versioning => massive amount of data that has accumulated over a long period of time. Keep in mind that OLAP tend to neglect this.
- small number of user, mostly data analysts
- not real-time, used once per month, once per year.
- can use both denormilized data model like Star schema and normalized data model like Snowflake schema. In case of Star schema, redundant date are permitted, they need to be controlled during ETL (extract, transform, load) process. Of course you could use Snowflake. Put in mind the pros and cons.
- 1- Star is denormalized, Snowflake is normalized
- 2- Star requires more storage for redundant data since Star is denormalized.
- 3- Star modify operation is costly since only a small change would need to keep in sync with its copy.
- 4- Star is easier to use and read since it is single-nested, Snowflake is multi-nested.
- 5- Star DOES NOT NEED TO JOIN, Snowflake DOES NEED TO JOIN
- 6- Star is more difficult to troubleshoot, since it does not have the 'single point of reference' normalized data feature that Snowflake has.
One of many reasons people use RDBMS is the cost of storage in 2000s, they're very expensive (In 2007, an USB gig with the size of 512MB costs 1.5M VND). But storage now is getting cheaper and cheaper, along with hardware and software performance specifically for handling denormalized data is increasing, OLAP oftens use denormalized data model.
Human lacks patience now! Performance is the most important criteria now!