Skip to content

Instantly share code, notes, and snippets.

@wilmoore
Last active December 26, 2023 18:29
Show Gist options
  • Select an option

  • Save wilmoore/0c64e0e12bc2b1d5e66d2a8d63617392 to your computer and use it in GitHub Desktop.

Select an option

Save wilmoore/0c64e0e12bc2b1d5e66d2a8d63617392 to your computer and use it in GitHub Desktop.
Software Engineering :: Database :: Redis :: Solutions :: Cache

Software Engineering :: Database :: Redis :: Solutions :: Cache

⪼ Made with 💜 by Polyglot.

A cache provides a shortcut to access "hot data", improving performance.

A typical cache architecture has three layers:

  1. Application Cache: This sits inside the application's memory and is usually a hashmap holding frequently accessed data like user profiles. The cache size is small and data is lost when the app restarts.
  2. Second Level Cache: This is often an in-process or out-of-process cache like EhCache. It requires configuring an eviction policy like LRU, LFU, or TTL based eviction for automatic cache invalidation. The cache is local to each server.
  3. Distributed Cache: This is usually Redis, deployed on separate servers from the application servers. Redis supports different eviction policies to control what data stays in the cache. The cache can be sharded across multiple servers for horizontal scalability. The cache is shared across multiple apps. Redis offers persistence, replication for high availability, and a rich set of data structures.

Redis lets you cache different data types like strings for user’s full names, hashes for user profiles, etc. However, the database remains the complete source of truth and holds the full set of data, while Redis caches the hot subsets.

Based on the Pareto principle, around 20% of data tends to make up 80% of accesses. So caching the hottest 20% of data in Redis can improve performance for a majority of requests. This 80/20 rule of thumb can guide what data is cached versus stored solely in the database.

The cache hierarchy allows managing different data sizes/access patterns efficiently. The first level cache holds a small volume of very hot data. The second level cache holds more data, still frequently accessed. The distributed Redis cache can hold large datasets by sharding across servers.

Using Redis as a cache improves performance but introduces complexity around cache coherence. There can be multiple copies of data, so read/write strategies need careful design. Typically one data source is designated as the "source of truth" and writes go there first. The application can implement lazy loading and write-through patterns when using Redis as a cache to keep it updated. Cache aside and read aside are other application-level caching patterns that Redis readily supports.

Caching is a classic time vs space tradeoff - we duplicate data across the system to gain speed. Interested readers can check our previous issues on caching best practices.

Based on the Pareto principle, 20% of the data in the system is mostly accessed, so this should be good guidance for the caching strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment