Summary of "Highload 2. Кэширование"
Core concepts
Cache pyramid: multiple layers trade response speed vs cost vs capacity — from smallest/fastest to largest/slower: CPU registers → in-memory cache → local disk → network/disk/database. Faster layers are more expensive and smaller.
- Heterogeneous storage: data may live on different devices/servers; failures or network hops make access slower and less predictable.
- Cache is advisory and ephemeral: caches can evict data, crash, or be restarted (Chaos Monkey example). Design assuming cached items may disappear at any time.
What to cache (types and tradeoffs)
- Raw data
- Smallest storage cost.
- Requires heavy per-request processing/formatting → slower responses.
- Processed / aggregated data
- Fewer CPU cycles per request.
- Lower bandwidth/space cost than raw data.
- Good when many users need the same processed result.
- Pre-rendered HTML blocks (fragments)
- Fastest for web pages.
- Useful for frequently reused UI blocks (weather widget, header, footer).
- Whole pages vs blocks
- Whole-page caching is simplest for non-personalized content.
- Block caching enables reuse across pages but adds composition complexity.
- Personalization
- Personalized elements (user name, recent messages) complicate caching.
- Use multi-level caching (global blocks + per-user fragments) or dynamic composition.
Performance model and metrics
- Hit ratio (cache hits vs misses) directly affects average latency and resource use.
- Example: if cache access = 10 ms and expensive compute = 100 ms, average latency = hit_rate * 10 ms + (1 - hit_rate) * 100 ms.
- Session behavior matters: the first request in a session may be a miss (expensive), while subsequent page views can hit the cache. Estimate pages-per-session to decide caching benefit.
Cache placement and client-side caching
- Server-side caching is the common approach.
- Client/browser caching can offload load for static assets (images, logos), but central control and forced revocation are harder.
- Use headers, JavaScript, and cookies judiciously.
- Dynamic/personalized data is usually cached server-side; client-side caching is fine for static assets.
Key-value cache model and TTL
- Typical cache store: key + value + TTL. Establish and follow key naming conventions across services.
- Cache entries can include metadata (e.g., value + last-update timestamp) to implement soft-expire semantics or avoid race conditions.
- TTL is set on write; caches may still evict entries earlier — do not assume absolute guarantees.
Eviction and failure modes
- Eviction strategies: LRU (least recently used), LFU (least frequently used), FIFO (first-in-first-out), etc.
- Common failure causes: memory limits, process kills, restarts, eviction policies.
- Always design assuming misses and evictions will occur; treat cache as an optimization layer, not a source of truth.
Cold-start / warm-up
- Cold-start problem: empty cache at deploy or restart can cause a flood of expensive recomputations (thundering herd).
- Warm-up strategies: pre-warm the cache by requesting the most popular pages at startup (e.g., call the top 100 pages) so the cache is populated before users arrive.
- Distributed cache vs single-server cache: distributed caches add complexity. Many projects fit on a single well-provisioned server early on — prefer the simplest solution that meets your needs.
Stampede / thundering herd / race conditions
- When many requests discover a stale or missing cache key at once, they can trigger parallel recomputations, causing overload and higher latencies.
- Mitigations:
- Mutual exclusion / locking (only one worker recomputes a key).
- “In-progress” flags to signal a recompute is underway.
- Soft TTLs and stale-while-revalidate: serve stale data while recomputing in background.
- Small protective time windows to avoid many concurrent recalculations.
Invalidation strategies (coherence)
Three basic approaches when underlying data changes:
- Synchronous overwrite / write-through / invalidate on write
- Update or invalidate cache at the time of the data change.
- Fast for reads but can be expensive and complex if many dependent keys exist.
- Immediate eviction + background recompute
- Invalidate quickly on write and trigger asynchronous tasks to rebuild cache entries.
- Serve stale and recompute in background
- Soft-expire entries and let a single worker recompute in-flight while others serve stale responses.
Notes:
- Determining which cache keys depend on a changed data item (dependency/coherence tree) is a hard problem. Solutions include editorial interfaces that track relationships, tagging systems, or accepting looser consistency.
- For mission-critical systems (banking), strong freshness guarantees are required — avoid serving stale data for critical fields or use strict write/commit semantics.
Practical engineering patterns and recommendations
- Choose what to cache based on:
- Read frequency
- Personalization needs
- Update frequency
- Cost to compute
- Acceptable staleness
- Multi-layer approach: global caches for non-personal data; per-user caches for personalization where needed.
- Use metadata in values (e.g., last update time, soft TTL) to implement safe soft-expiry and avoid cache stampedes.
- Pre-warm most-requested pages at startup to reduce cold-start pain.
- Prefer simple designs early — avoid distributed cache complexity unless scale requires it.
- Treat cache as an optimization layer, not a source of truth. Build systems to work correctly without cache.
- Invalidation options:
- Synchronous invalidation if freshness is essential.
- If synchronous invalidation is too heavy, invalidate and queue background recomputation, or serve stale responses temporarily while rebuilding.
Examples from the talk
- News site / main page: main page dominates traffic; pre-warming top pages can quickly warm the cache and reduce load.
- Banking/payment systems: cannot serve stale balances; must use strict write and commit semantics for critical data while caching non-critical views.
- Session/page-view modeling: calculate expected savings using pages-per-session and hit ratios.
Tutorial / guide takeaways
- The talk is practical and tutorial-like: it explains how to decide what to cache, how to measure benefit (latency vs hit rate), how to avoid common pitfalls (cold start, stampede, eviction), and which engineering solutions to apply (pre-warm, soft-expire, background recompute).
- Recommended workflow:
- Analyze access and update patterns.
- Pick caching granularity (raw / processed / HTML fragment).
- Choose TTL and eviction policy.
- Implement locking/soft-expire to avoid stampede.
- Implement an invalidation strategy (synchronous or background) appropriate to consistency requirements.
- Monitor behavior and iterate.
Main speakers and referenced systems
- Presenter: unnamed Highload conference speaker (lecture-style talk on caching).
- Referenced systems/tools/terms: Yandex, Elbrus (hardware reference), Chaos Monkey (failure testing), common cache concepts (LRU, LFU, FIFO).
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...