Affinity Request Model
Status: design decision. Commits how typed affinity constraints are expressed on placement requests.
Builds on:
docs/grafos/affinity-taxonomy.md(taxonomy + strength classes) anddocs/spec/scheduler-isolation-policy.md(filter→score→adapt pipeline). Scheduler-side implementation lands as a separate wave.
1. Problem
The taxonomy doc defines 5 affinity categories and 3 strength classes, but the request model only has:
PlacementRequest::affinity_with: Option<NodeId>— soft, single-nodePlacementRequest::anti_affinity_with: Option<NodeId>— soft, single-nodeTaskletAffinity— colocation strength (SameNode/SameRack/Any)Service::anti_affinity_services: Vec<ServiceId>
There is no way for a caller to express “required resource affinity with GPU X” or “preferred data affinity with lease Y’s node” or “required anti-affinity from service Z’s failure domain.”
2. Decision: TLV-based affinity entries on the params blob
Follow the existing isolation/exclusivity precedent: carry affinity as
optional TLV entries on the LeaseAllocRequest params blob. Each
affinity entry is one TLV with a structured value encoding category,
strength, and target.
2.1 TLV layout
tag = 0x0910 (u16 BE) — TLV_LEASE_AFFINITYlength = N (u16 BE)value = affinity entry (variable length)Multiple TLV_LEASE_AFFINITY entries may appear in the same params
blob — one per affinity constraint. This matches how TLV streams work
(scan-for-tag finds all entries, not just the first).
2.2 Affinity entry encoding
+--------+--------+--------+--------+| category (u8) |+--------+--------+--------+--------+| strength (u8) |+--------+--------+--------+--------+| target_type (u8) |+--------+--------+--------+--------+| target_len (u16 BE) |+--------+--------+--------+--------+| target (target_len bytes) |+--------+--------+--------+--------+Total entry: 5 + target_len bytes.
2.3 Category encoding
| Value | Category | v1? | Notes |
|---|---|---|---|
| 0x01 | Resource | Yes | Co-locate with a specific resource_id |
| 0x02 | State | Yes | Co-locate with a lease/data shard |
| 0x03 | Topology | Yes | Anti-affinity from a failure domain |
| 0x04 | Trust | Yes | Require attestation domain match |
| 0x05 | Facility | No | Deferred (thermal/power/cooling) |
| 0x06..0xFE | reserved | — | Fail closed |
2.4 Strength encoding
| Value | Strength | Scheduler stage |
|---|---|---|
| 0x01 | Required | Filter (hard constraint, fail-closed) |
| 0x02 | Preferred | Score (soft ranking boost) |
| 0x03 | Adaptive | Reserved for the future adapt stage |
2.5 Target type encoding
| Value | Target type | target bytes | Used with categories |
|---|---|---|---|
| 0x01 | NodeId | 16 bytes (u128 BE) | Resource, State, Topology |
| 0x02 | ResourceId | 16 bytes (u128 BE) | Resource |
| 0x03 | LeaseId | 16 bytes (u128 BE) | State |
| 0x04 | ServiceId | 16 bytes (u128 BE) | Topology anti-affinity |
| 0x05 | TrustDomain | variable (UTF-8 string) | Trust |
| 0x06 | RackId | 4 bytes (u32 BE) | Topology |
2.6 Anti-affinity
Anti-affinity is not a separate category — it is expressed by combining
Topology category with the appropriate target. The taxonomy doc §5.3
defines anti-affinity as “prefer/require placement away from a specific
node, rack, or service’s failure domain.”
To distinguish affinity-toward from affinity-away, add a direction bit to the strength byte:
strength byte layout: bits [0:6] = strength value (Required=0x01, Preferred=0x02, Adaptive=0x03) bit [7] = anti-affinity flag (0 = toward, 1 = away)So:
0x01= Required affinity (toward)0x81= Required anti-affinity (away from)0x02= Preferred affinity (toward)0x82= Preferred anti-affinity (away from)
This keeps anti-affinity first-class per the taxonomy doc §13 principle without adding a separate encoding dimension.
3. Fail-closed rules
- Unknown category byte → reject with
LeaseError::InvalidIntent. - Category not in v1 set (e.g.
Facility= 0x05) → reject until that category is implemented in a future phase. - Unknown strength byte (after masking the anti-affinity bit) → reject.
- Unknown target type for the given category → reject.
- Required affinity that cannot be satisfied → scheduler returns
empty placement (no candidates pass the filter). Rejection reason:
AffinityRejection::RequiredAffinityUnsatisfiable. - Preferred affinity with no matching candidates → placement proceeds with reduced score; no rejection. The caller gets placement but may not get the preferred target.
4. Interaction with existing fields
The existing PlacementRequest fields (affinity_with,
anti_affinity_with, Strategy::Affinity/AntiAffinity) remain as
shorthand for the most common case: preferred node-level affinity.
They are equivalent to:
affinity_with: Some(NodeId(X))≡ one TLV entry withcategory=Resource, strength=Preferred, target=NodeId(X)anti_affinity_with: Some(NodeId(X))≡ one TLV entry withcategory=Topology, strength=Preferred|Anti, target=NodeId(X)
The scheduler should normalize both forms into the same internal representation before filtering/scoring. This preserves backwards compatibility while enabling richer affinity for new callers.
TaskletAffinity (SameNode/SameRack/Any) on TaskletLeaseAllocRequest
is orthogonal — it describes CPU+memory colocation strength for
composite leases, not placement affinity. It stays as-is.
5. Minimal v1 surface
Per taxonomy §12, the v1 surface includes:
| Category | Strength | Target types | Example use case |
|---|---|---|---|
| Resource | Required/Preferred | NodeId, ResourceId | ”Place near GPU X” |
| State | Required/Preferred | NodeId, LeaseId | ”Place on same node as lease Y” |
| Topology | Required/Preferred + Anti | NodeId, RackId, ServiceId | ”Not on same rack as service Z” |
| Trust | Required | TrustDomain | ”Only on attested nodes in domain D” |
Deferred to later phases:
Facilitycategory (thermal/power/cooling)Adaptivestrength (runtime telemetry-driven adjustment)- Multi-entry scoring weights (e.g. “preferred affinity A is 2x more important than preferred affinity B”)
6. Scheduler integration
The scheduler pipeline already has the right shape:
filter → score → adaptThe implementation wave adds:
-
AffinityRequiredFilter— scans TLV entries withstrength=Required, rejects candidates that don’t match. ReturnsAffinityRejection::RequiredAffinityUnsatisfiablewhen all candidates are eliminated. -
AffinityPreferredScorer— scans TLV entries withstrength=Preferred, adds a weighted boost to candidates that match. Uses a newaffinityweight dimension inScoreWeightsalongside existing fit/locality/pressure/etc. -
Adapt stage — reserved for
Adaptivestrength in a future wave. Initially a no-op.
7. SDK surface (future)
Once the wire format lands, the SDK should expose something like:
use grafos_std::cpu::{CpuBuilder, CpuIsolationClass};use grafos_std::affinity::{Affinity, AffinityStrength, AffinityTarget};
let lease = CpuBuilder::new() .single_core() .isolation(CpuIsolationClass::WholeCore) .affinity(Affinity::resource(AffinityStrength::Preferred, AffinityTarget::node(gpu_node_id))) .anti_affinity(Affinity::topology(AffinityStrength::Required, AffinityTarget::rack(rack_42))) .lease_secs(60) .acquire()?;The SDK surface is out of scope for this design note and is deferred to a separate follow-on.
8. What this note does NOT commit to
- Implementation. TLV parser, scheduler filter/scorer, and SDK knob are all separate follow-on waves.
- Facility category (thermal/power/cooling) — deferred.
- Adaptive strength semantics — reserved; needs telemetry infrastructure that doesn’t exist yet.
- Multi-entry weight tuning — deferred to a scoring-refinement wave once the basic preferred affinity works.
- Cross-entry conflict resolution — e.g. “required affinity with node A” + “required anti-affinity from node A” → immediately rejected as contradictory. The parser should detect and reject these statically.
9. Cross-links
docs/grafos/affinity-taxonomy.md— canonical taxonomy this request model servesdocs/spec/scheduler-isolation-policy.md— shared filter→score→adapt pipelinedocs/spec/cpu-isolation-wire-format.md— TLV precedent onLeaseAllocRequestparams blobcrates/grafos-scheduler/src/placement.rs—PlacementRequeststruct with existingaffinity_with/anti_affinity_withfieldscrates/grafos-scheduler/src/isolation_filter.rs— filter-stage pattern thatAffinityRequiredFilterwill followcrates/fabricbios-core/src/tasklet.rs:31-48—TaskletAffinity(orthogonal, colocation strength)