A cache is a temporary storage layer that keeps frequently accessed data in a faster location so future requests can be served more quickly. Caches trade extra memory or storage for speed, reducing the need to recompute results or fetch them from slower systems. They appear at many levels, from CPU caches next to the CPU to application level caches like Redis. When a cache has the requested data, it is called a cache hit, and the response can be served immediately. When the data is missing, the system must fetch it from the original source, store it in the cache, and then return it, which is called a cache miss. Well tuned caches can dramatically reduce latency and load on backends.
common pitfalls
Caching is powerful but introduces complexity because cached data can become stale or inconsistent. Designers must decide how long items stay in the cache, how they are invalidated, and what happens when capacity is full. Poor cache keys can cause unexpected collisions or prevent reuse across similar requests. Distributed caches must handle replication and network latency, and they can fail like any other service. It is also easy to hide underlying performance issues by masking them behind a cache instead of fixing root causes. When working with AI on performance topics, it helps to specify where caches exist, what they store, and how they are invalidated.