GPU Exclusivity Wire Format
Status: design decision. Commits the wire shape for per-lease GPU exclusivity class.
Addendum to:
docs/spec/resource-isolation-and-exclusivity.md§5.3 and §6.2. Parallel to:docs/spec/cpu-isolation-wire-format.md— GPU follows the same TLV precedent for the same reasons.
1. Problem
§5.3 names the initial GPU exclusivity vocabulary: shared,
session-exclusive, device-exclusive, future partition-exclusive.
The wire path has no field to carry it. The existing
--gpu-share-mode exclusive|fractional switch is a daemon-wide
configuration; MIG support exposes partitions as separate
resource_ids. Clients cannot currently ask for a specific class
per lease.
2. Decision
Add TLV_LEASE_GPU_EXCLUSIVITY (0x0903) to the existing
LeaseAllocRequest params blob, single-byte class. Mirrors
TLV_LEASE_CPU_ISOLATION (0x0902) and
TLV_LEASE_INTENT_KV_CACHE (0x0901) exactly.
2.1 TLV layout
+--------+--------+--------+--------+| tag=0x0903 (u16 BE) |+--------+--------+--------+--------+| len=0x0001 (u16 BE) |+--------+--------+--------+--------+| class (u8) |+--------+--------+--------+--------+Class encoding:
| Value | Class | Notes |
|---|---|---|
| 0x00 | Shared | Device may multiplex other tenants. |
| 0x01 | SessionExclusive | Exclusive residency for session lifetime. |
| 0x02 | DeviceExclusive | Whole device for lease lifetime. |
| 0x03 | PartitionExclusive | Reserved for future MIG/partition mode. |
| 0x04..0xFE | reserved | Fail closed per §3. |
| 0xFF | reserved sentinel | Never valid on the wire. |
2.2 Absent TLV
Absent TLV inherits the daemon-wide default set by
--gpu-share-mode:
--gpu-share-mode exclusive(current default) →DeviceExclusive--gpu-share-mode fractional→Shared
Existing clients see unchanged behavior. This is the zero-migration path, symmetric with the CPU-side treatment.
3. Fail-closed rules
Per §6.2:
- Unknown class byte → reject with
LeaseError::InvalidIntent. - Class not supported by the target device → reject with
LeaseError::InvalidArgsand a rejection-reason TLV. Examples:PartitionExclusiveon a non-MIG deviceSharedwhen--gpu-share-mode exclusiveis set daemon-wide and the class conflicts with operator policy (see §4 below)SessionExclusiveon a runtime that has no v1GpuSessionsupport
- Malformed TLV length → reject.
- Conflict with targeted
resource_id— e.g. client targets a MIG sub-device resource_id and asks forDeviceExclusive→ reject. The two are semantically incompatible: you cannot claim the whole device while leasing only a partition.
4. Interaction with daemon-wide mode
The daemon-wide --gpu-share-mode sets the default class when the
TLV is absent, not a ceiling. A client may:
- Under
--gpu-share-mode exclusive: requestShared→ rejected, because operator policy forbids sharing. - Under
--gpu-share-mode fractional: requestDeviceExclusive→ honored if no other lease currently holds the device; rejected otherwise withLeaseError::ResourceBusy(existing variant).
The rule: operator-level daemon mode is a permission envelope, not a default-only hint. Clients cannot escape a tighter daemon mode by asking for a looser class, and they can request a tighter class than the daemon default if the device is currently free.
5. Interaction with v1 GpuSession
SessionExclusive maps onto the v1 GpuSession
persistent-session execution mode. Semantics:
- A
SessionExclusivelease is honored for the lifetime of the tasklet’s v1 session — the session holds the GPU state for the duration of its session handle, and no other tenant may observe that state. - When no v1 session is open, a
SessionExclusivelease behaves identically toDeviceExclusive— the device is reserved for the lease holder regardless. - The practical distinction between
SessionExclusiveandDeviceExclusivebecomes observable only when multiple tenants share the device via fractional mode:SessionExclusivereserves state-isolation only while a session is active;DeviceExclusivereserves the whole device unconditionally.
6. Interaction with MIG
MIG sub-devices are already exposed as separate resource_ids.
This note does not change that. The exclusivity class and the
resource_id answer different questions:
resource_id→ which physical/virtual device the lease targets- exclusivity class → how that device is shared during the lease
PartitionExclusive is reserved for the future case where the daemon
exposes partitioning as a class attribute rather than as separate
resource_ids. Under today’s MIG behavior, a client targeting a
MIG sub-device resource_id with class DeviceExclusive effectively
claims that partition exclusively — the GPU_MIG inventory flag
already advertises that the target is a partition.
7. Inventory advertisement
Inventory (GET_INVENTORY) advertises the GPU_MIG flag (0x0004)
today. A future revision should add a per-device field advertising
which exclusivity classes the daemon will honor, mirroring the
planned CPU isolation-class inventory. That advertisement is
deferred and out of scope here.
8. Why a new TLV (not the alternatives)
Four options were considered:
- Shared CPU+GPU isolation field (option 1a). Rejected: the value spaces are distinct (CPU: 3 classes, GPU: 4), and conflating them forces every consumer to resource-type-switch before decoding. Separate TLVs are cleaner.
- Dedicated sub-op (option 3). Rejected: same op-code proliferation argument as the CPU-side decision.
resource_idtargeting (option 4). Rejected: cannot express session-exclusive, which is a daemon-mode attribute not a device attribute. Also couples exclusivity to device identity in a way that breaks on fractional-mode setups.- TLV (option 2). Chosen for precedent-consistency with the CPU
isolation TLV and
TLV_LEASE_INTENT_KV_CACHE.
9. What this note does NOT commit to
- Implementation. The TLV parser, field on
LeaseAllocRequest, daemon-side policy enforcement, golden vectors, and wire-encoding-v0 doc update are deferred to a separate implementation wave. GpuBuilder::exclusivity()SDK knob — deferred.- Scheduler policy for density/predictability/exclusivity tradeoffs — deferred.
- Inventory advertisement — deferred.
- MIG partition-exclusive semantics — reserved for the future.
- Renaming
GpuShareMode→ something that distinguishes it from per-lease class — a naming hygiene follow-on, not a design decision.
10. Cross-links
docs/spec/resource-isolation-and-exclusivity.md§5.3, §6.2docs/spec/cpu-isolation-wire-format.md— symmetric CPU decisiondocs/fabricbios-gpu-abi-v1.md— v1GpuSessionpersistent-session execution mode- The
GPU_MIGinventory flag identifies MIG partitions - The
--gpu-share-modedaemon-wide configuration sets the default exclusivity class