- Product feeds are ingested into a kafka topic by several systems.
- Product feed means all the inventory about products (quantity, price, name, brand, color, description,etc...)
- We have to store them in a DB.
- We have to expose one API which will give the details about products based on product id/color/category etc.
- During the ingestion we have to do some processing, transformation etc
Product feed data examples
JSON{ "product":"..","name":"..", "color":"..", "description":"..", "price":.., "color":"", "size":".." }
JSON{ "product":"..","name":"..", "color":"..", "description":"..", "price":.., "approvals":"", "size":".." }
Q: How can the system handle huge scale of data? A: Distributed system implementation - Clustered BE app process - Database clustered (master-slave) - Database paritioned / sharded - Queues partitioned and concurrency handling enabled
Q: How can system be design be fault tolerant and robust? (If any request processing fails due to any issue, it should be reprocessed) A: Implementing - Dead letter queue where failed messages will be sent to for retry in batches in every 2 hours with ideampotency using e-tag checksum or/and last-modified timestamp - Commit messages manually quickly for create / delete while synchronours for same entity updates - Retries 0 as retry would be implemented separately
Q: How to store in the DB, what kind of DB and why? A: Any NoSQL database as the requirement is read heavy without relational db requirement and the database should supporing full text search. Hence, ES (ElastiSearch) would be ideal
Q: What are the high level components? A: Queue -> Consumer -> Db -> Dead letter queue. Dead letter queue -> Cron job -> Db. Redis for cache