Module cache_metrics

Module cache_metrics 

Source
Expand description

Cache-specific observability metrics for LLM inference KV caches.

Defines the 12 metric names from the design document Section 7.5 and provides a CacheMetrics struct that aggregates all cache-related counters, gauges, and histograms. The CacheMetrics::global() singleton can be used by KvCacheManager and other cache infrastructure to emit metrics.

Structs§

CacheMetrics
Aggregated cache observability metrics.

Constants§

ALL_CACHE_METRIC_NAMES
All 12 cache metric names as a slice, for validation.
CACHE_ATTACH_FAILURE_TOTAL
Counter: total attach failures, labeled by reason.
CACHE_DECODE_FAR_FROM_CACHE
Counter: decode requests placed far from cache (low cache_locality score).
CACHE_DECODE_LATENCY_US
Histogram: steady-state per-token decode latency in microseconds.
CACHE_FIRST_TOKEN_LATENCY_US
Histogram: time to first token in microseconds.
CACHE_FORK_TOTAL
Counter: total cache forks.
CACHE_HIT_TOTAL
Counter: total cache hits, labeled by cache_class.
CACHE_MISS_TOTAL
Counter: total cache misses, labeled by cache_class.
CACHE_PREFILL_LATENCY_US
Histogram: prefill latency in microseconds.
CACHE_RECLAIM_TOTAL
Counter: total cache reclaims, labeled by cause (expired, revoked, evicted).
CACHE_RESIDENT_BYTES
Gauge: resident bytes, labeled by tier (VRAM, DRAM, CXL).
CACHE_SPILL_BYTES_TOTAL
Counter: total bytes spilled between tiers, labeled by (src_tier, dst_tier).
CACHE_WARMUP_BYTES_TOTAL
Counter: total bytes warmed between tiers, labeled by (src_tier, dst_tier).