pub struct CacheMetrics {
pub prefill_latency: MetricHistogram,
pub first_token_latency: MetricHistogram,
pub decode_latency: MetricHistogram,
pub hit_total: MetricCounter,
pub miss_total: MetricCounter,
pub resident_bytes: MetricGauge,
pub spill_bytes_total: MetricCounter,
pub warmup_bytes_total: MetricCounter,
pub attach_failure_total: MetricCounter,
pub fork_total: MetricCounter,
pub reclaim_total: MetricCounter,
pub decode_far_from_cache: MetricCounter,
}Expand description
Aggregated cache observability metrics.
Provides counters, gauges, and histograms for the 12 cache metrics defined
in the design document Section 7.5. Access the process-wide singleton via
CacheMetrics::global().
For labeled metrics (e.g. hit_total by cache_class, resident_bytes by tier), per-label tracking is left to the caller — the counters here provide the aggregate emission point. Callers should use the metric name constants along with labels when exporting to Prometheus or OTLP.
Fields§
§prefill_latency: MetricHistogramHistogram: prefill latency in microseconds.
first_token_latency: MetricHistogramHistogram: time to first token in microseconds.
decode_latency: MetricHistogramHistogram: steady-state per-token decode latency in microseconds.
hit_total: MetricCounterCounter: total cache hits (across all cache classes).
miss_total: MetricCounterCounter: total cache misses (across all cache classes).
resident_bytes: MetricGaugeGauge: total resident bytes (across all tiers).
spill_bytes_total: MetricCounterCounter: total bytes spilled between tiers.
warmup_bytes_total: MetricCounterCounter: total bytes warmed between tiers.
attach_failure_total: MetricCounterCounter: total attach failures.
fork_total: MetricCounterCounter: total cache forks.
reclaim_total: MetricCounterCounter: total cache reclaims (expired + revoked + evicted).
decode_far_from_cache: MetricCounterCounter: decode requests placed far from cache.
Implementations§
Source§impl CacheMetrics
impl CacheMetrics
Sourcepub fn global() -> &'static CacheMetrics
pub fn global() -> &'static CacheMetrics
Access the global cache metrics instance.
Sourcepub fn record_cache_created(&self, logical_bytes: u64)
pub fn record_cache_created(&self, logical_bytes: u64)
Record a cache creation: increments resident_bytes by the given amount.
Sourcepub fn record_cache_hit(&self)
pub fn record_cache_hit(&self)
Record a cache hit for the given cache class.
Sourcepub fn record_cache_miss(&self)
pub fn record_cache_miss(&self)
Record a cache miss for the given cache class.
Sourcepub fn record_attach_failure(&self)
pub fn record_attach_failure(&self)
Record a cache attach failure.
Sourcepub fn record_cache_spill(&self, bytes_moved: u64)
pub fn record_cache_spill(&self, bytes_moved: u64)
Record a cache spill: increments spill_bytes_total.
Sourcepub fn record_cache_warmup(&self, bytes_moved: u64)
pub fn record_cache_warmup(&self, bytes_moved: u64)
Record a cache warmup: increments warmup_bytes_total.
Sourcepub fn record_cache_reclaimed(&self, logical_bytes: u64)
pub fn record_cache_reclaimed(&self, logical_bytes: u64)
Record a cache destruction/eviction: decrements resident_bytes and increments reclaim_total.
Sourcepub fn record_cache_forked(&self, logical_bytes: u64)
pub fn record_cache_forked(&self, logical_bytes: u64)
Record a cache fork: increments fork_total and resident_bytes.
Sourcepub fn record_prefill_latency(&self, latency_us: u64)
pub fn record_prefill_latency(&self, latency_us: u64)
Record a prefill latency observation.
Sourcepub fn record_first_token_latency(&self, latency_us: u64)
pub fn record_first_token_latency(&self, latency_us: u64)
Record a first-token latency observation.
Sourcepub fn record_decode_latency(&self, latency_us: u64)
pub fn record_decode_latency(&self, latency_us: u64)
Record a decode latency observation.
Sourcepub fn record_decode_far_from_cache(&self)
pub fn record_decode_far_from_cache(&self)
Record a decode request placed far from cache.