Skip to content

GPU Exclusivity Wire Format

Status: design decision. Commits the wire shape for per-lease GPU exclusivity class.

Addendum to: docs/spec/resource-isolation-and-exclusivity.md §5.3 and §6.2. Parallel to: docs/spec/cpu-isolation-wire-format.md — GPU follows the same TLV precedent for the same reasons.


1. Problem

§5.3 names the initial GPU exclusivity vocabulary: shared, session-exclusive, device-exclusive, future partition-exclusive. The wire path has no field to carry it. The existing --gpu-share-mode exclusive|fractional switch is a daemon-wide configuration; MIG support exposes partitions as separate resource_ids. Clients cannot currently ask for a specific class per lease.

2. Decision

Add TLV_LEASE_GPU_EXCLUSIVITY (0x0903) to the existing LeaseAllocRequest params blob, single-byte class. Mirrors TLV_LEASE_CPU_ISOLATION (0x0902) and TLV_LEASE_INTENT_KV_CACHE (0x0901) exactly.

2.1 TLV layout

+--------+--------+--------+--------+
| tag=0x0903 (u16 BE) |
+--------+--------+--------+--------+
| len=0x0001 (u16 BE) |
+--------+--------+--------+--------+
| class (u8) |
+--------+--------+--------+--------+

Class encoding:

ValueClassNotes
0x00SharedDevice may multiplex other tenants.
0x01SessionExclusiveExclusive residency for session lifetime.
0x02DeviceExclusiveWhole device for lease lifetime.
0x03PartitionExclusiveReserved for future MIG/partition mode.
0x04..0xFEreservedFail closed per §3.
0xFFreserved sentinelNever valid on the wire.

2.2 Absent TLV

Absent TLV inherits the daemon-wide default set by --gpu-share-mode:

  • --gpu-share-mode exclusive (current default) → DeviceExclusive
  • --gpu-share-mode fractionalShared

Existing clients see unchanged behavior. This is the zero-migration path, symmetric with the CPU-side treatment.

3. Fail-closed rules

Per §6.2:

  1. Unknown class byte → reject with LeaseError::InvalidIntent.
  2. Class not supported by the target device → reject with LeaseError::InvalidArgs and a rejection-reason TLV. Examples:
    • PartitionExclusive on a non-MIG device
    • Shared when --gpu-share-mode exclusive is set daemon-wide and the class conflicts with operator policy (see §4 below)
    • SessionExclusive on a runtime that has no v1 GpuSession support
  3. Malformed TLV length → reject.
  4. Conflict with targeted resource_id — e.g. client targets a MIG sub-device resource_id and asks for DeviceExclusive → reject. The two are semantically incompatible: you cannot claim the whole device while leasing only a partition.

4. Interaction with daemon-wide mode

The daemon-wide --gpu-share-mode sets the default class when the TLV is absent, not a ceiling. A client may:

  • Under --gpu-share-mode exclusive: request Shared → rejected, because operator policy forbids sharing.
  • Under --gpu-share-mode fractional: request DeviceExclusive → honored if no other lease currently holds the device; rejected otherwise with LeaseError::ResourceBusy (existing variant).

The rule: operator-level daemon mode is a permission envelope, not a default-only hint. Clients cannot escape a tighter daemon mode by asking for a looser class, and they can request a tighter class than the daemon default if the device is currently free.

5. Interaction with v1 GpuSession

SessionExclusive maps onto the v1 GpuSession persistent-session execution mode. Semantics:

  • A SessionExclusive lease is honored for the lifetime of the tasklet’s v1 session — the session holds the GPU state for the duration of its session handle, and no other tenant may observe that state.
  • When no v1 session is open, a SessionExclusive lease behaves identically to DeviceExclusive — the device is reserved for the lease holder regardless.
  • The practical distinction between SessionExclusive and DeviceExclusive becomes observable only when multiple tenants share the device via fractional mode: SessionExclusive reserves state-isolation only while a session is active; DeviceExclusive reserves the whole device unconditionally.

6. Interaction with MIG

MIG sub-devices are already exposed as separate resource_ids. This note does not change that. The exclusivity class and the resource_id answer different questions:

  • resource_idwhich physical/virtual device the lease targets
  • exclusivity class → how that device is shared during the lease

PartitionExclusive is reserved for the future case where the daemon exposes partitioning as a class attribute rather than as separate resource_ids. Under today’s MIG behavior, a client targeting a MIG sub-device resource_id with class DeviceExclusive effectively claims that partition exclusively — the GPU_MIG inventory flag already advertises that the target is a partition.

7. Inventory advertisement

Inventory (GET_INVENTORY) advertises the GPU_MIG flag (0x0004) today. A future revision should add a per-device field advertising which exclusivity classes the daemon will honor, mirroring the planned CPU isolation-class inventory. That advertisement is deferred and out of scope here.

8. Why a new TLV (not the alternatives)

Four options were considered:

  • Shared CPU+GPU isolation field (option 1a). Rejected: the value spaces are distinct (CPU: 3 classes, GPU: 4), and conflating them forces every consumer to resource-type-switch before decoding. Separate TLVs are cleaner.
  • Dedicated sub-op (option 3). Rejected: same op-code proliferation argument as the CPU-side decision.
  • resource_id targeting (option 4). Rejected: cannot express session-exclusive, which is a daemon-mode attribute not a device attribute. Also couples exclusivity to device identity in a way that breaks on fractional-mode setups.
  • TLV (option 2). Chosen for precedent-consistency with the CPU isolation TLV and TLV_LEASE_INTENT_KV_CACHE.

9. What this note does NOT commit to

  • Implementation. The TLV parser, field on LeaseAllocRequest, daemon-side policy enforcement, golden vectors, and wire-encoding-v0 doc update are deferred to a separate implementation wave.
  • GpuBuilder::exclusivity() SDK knob — deferred.
  • Scheduler policy for density/predictability/exclusivity tradeoffs — deferred.
  • Inventory advertisement — deferred.
  • MIG partition-exclusive semantics — reserved for the future.
  • Renaming GpuShareMode → something that distinguishes it from per-lease class — a naming hygiene follow-on, not a design decision.
  • docs/spec/resource-isolation-and-exclusivity.md §5.3, §6.2
  • docs/spec/cpu-isolation-wire-format.md — symmetric CPU decision
  • docs/fabricbios-gpu-abi-v1.md — v1 GpuSession persistent-session execution mode
  • The GPU_MIG inventory flag identifies MIG partitions
  • The --gpu-share-mode daemon-wide configuration sets the default exclusivity class