Skip to content

fabricBIOS Architecture

This document provides a single-page overview of the fabricBIOS system architecture: components, protocols, resource lifecycle, and trust model.

What is fabricBIOS?

fabricBIOS is a minimal firmware specification for disaggregated computing fabrics. It enables nodes to advertise hardware resources, establish trust, exchange capability tokens, and create lease-based bindings to standard data planes (RDMA, NVMe-oF, GPU fabrics, CXL).

fabricBIOS is not an operating system. It exposes resources and enforces access control and lease expiry; policy, scheduling, and placement decisions live above (e.g., in grafOS).

See Premium Dataplane Methodology for the canonical reference on fabricBIOS’s premium dataplane model (RDMA, NVMe-oF, SR-IOV, GPU, CXL).

Component Map

┌──────────────────────────────────────────────────┐
│ Application Layer │
│ grafos-store, grafos-mq, grafos-registry, │
│ grafos-dashboard, grafos-cli, grafos-store-cli │
└────────────────────┬─────────────────────────────┘
┌────────────────────┴─────────────────────────────┐
│ grafOS Standard Libraries │
│ High: collections, tensor, stream, kv, fs, │
│ batch, dsp, jobs │
│ Mid: rpc, sync, net, observe, cache, │
│ securestore, pipeline, profile │
│ Foundation: grafos-std, leasekit, fence, │
│ locator, testkit │
└────────────────────┬─────────────────────────────┘
┌────────────────────┴─────────────────────────────┐
│ grafOS Runtime Layer │
│ grafos-core (graph model, rewrite plans) │
│ grafos-runtime (engine, adapters, scenarios) │
│ grafos-sdk (node authoring helpers) │
│ grafos-posix (WASI/ELF program execution) │
└────────────────────┬─────────────────────────────┘
│ QUIC 5701 + FBMU/FBBU
┌────────────────────┴─────────────────────────────┐
│ fabricbios-core (no_std) │
│ wire, codec, discovery, tokens, leases, │
│ FBMU, FBBU, QUIC adapter, inventory, │
│ bindings, cap_token │
└──┬───────────────┬───────────────┬───────────────┘
│ │ │
┌────────┴──────┐ ┌──────┴───────┐ ┌────┴──────────────┐
│ fabricbiosd │ │ platform- │ │ platform-rpi- │
│ (Linux │ │ linux │ │ baremetal │
│ daemon) │ │ │ │ (Pi5 firmware) │
└───────────────┘ └──────────────┘ └───────────────────┘

Crate Ecosystem (~45 crates)

fabricBIOS Protocol and Platform

CrateRole
fabricbios-corePortable protocol logic (wire, codec, discovery, tokens, leases, FBMU/FBBU, QUIC adapter, inventory). Supports no_std+alloc for bare-metal targets.
fabricbios-platform-traitsno_std trait abstractions (Clock, Entropy, UdpSocket, TcpStream, KeyValueStore, Logger).
fabricbios-platform-linuxLinux/std implementation: persistence, time, QUIC/UDP data-plane helpers, resource auto-detection.
fabricbios-platform-rpi-baremetalPi5 bare-metal: RP1 GEM Ethernet, BCM2712 PCIe, NVMe, DTB parsing, QUIC server, DMA, cache, heap, serial.
fabricbios-platform-x86-baremetalx86_64 bare-metal: serial, PCI, virtio, NIC (e1000), entropy, identity, storage.
fabricbiosdLinux daemon: node, relay, solicit, control-server, control-client, simulate subcommands. QUIC server, UDP discovery, resource auto-detection.
fabricbios-pi5-bringupPi5 bare-metal entry point. Boots via TFTP/netboot, runs QUIC server + FBMU/FBBU data planes.
fabricbios-qemu-virtQEMU aarch64 virt-machine bare-metal target.
fabricbios-quic-interopQUIC interop test client: client (control ops), fbmu (memory data-plane), fbbu (block data-plane), gen-client-cert (mTLS).
fabricbios-quic-cryptoQUIC packet protection (header protection, packet number encryption) for no_std.
fabricbios-simDeterministic network simulator (SimNet, SimClock, SimEntropy).
fabricbios-harnessIntegration test harness: simulated nodes, relays, controllers. Tests churn, replay, partitions, 100-node scale.

grafOS Runtime

CrateRole
grafos-coreGraph data model: Node, Port, Edge, Capability, LeaseRef, Binding, RewritePlan.
grafos-runtimeRuntime engine: graph store, rewrite engine, event queue, adapters (sim/live/QUIC), Pi5 demo scenarios, mixed-fleet orchestration, dashboard demo.
grafos-sdkRe-exports from grafos-core + grafos-runtime. Helper functions for graph-native nodes.
grafos-posixPOSIX-ish layer: ELF64 loader, WASI runtime, aarch64 initial stack builder.
grafos-posix-programsWASI test programs (smoke tests, filesystem walker, HTTP echo).

grafOS Standard Libraries — Foundation

CrateRole
grafos-stdTyped access to fabric memory, block, GPU, and CPU resources; resource RAII; builder APIs.
grafos-leasekitLease renewal and TTL budgeting (poll-driven, no_std compatible).
grafos-fenceTyped epoch/fencing helpers for stale-write rejection and leader fencing.
grafos-locatorTyped locators and rendezvous/handoff records for fabric resource discovery.
grafos-testkitTesting utilities and harness helpers for grafOS libraries.

grafOS Standard Libraries — Mid-Level

CrateRole
grafos-collectionsDistributed data structures (FabricVec, FabricHashMap, FabricQueue) backed by leased memory.
grafos-syncDistributed synchronization primitives (FabricMutex, FabricBarrier, FabricWatch) with lease-backed timeouts.
grafos-netNetwork-aware programming (FabricSocket, FabricListener, bandwidth-aware routing).
grafos-rpcRPC framework where the hot path is lease-backed shared memory instead of TCP.
grafos-observeFabric observability: metrics, events, distributed tracing, structured logging, OpenTelemetry (OTLP) export, Prometheus format.
grafos-observe-macrosProc macros for grafos-observe (#[grafos::instrument] etc.).
grafos-cacheLease-backed caching with tiered eviction.
grafos-securestoreEncrypted storage over fabric resources.
grafos-pipelineMulti-stage data processing pipelines across nodes.
grafos-profileProgram-level resource profiler: flame graphs, lease timelines, data-flow diagrams, waste reports.

grafOS Standard Libraries — High-Level

CrateRole
grafos-tensorTensor/ndarray operations on disaggregated memory and GPU.
grafos-streamStream processing / dataflow with cross-node pipeline stages.
grafos-kvKey-value store abstraction over leased memory with block-storage spillover.
grafos-fsDistributed filesystem abstraction over leased block storage.
grafos-batchBatch job / task graph executor with automatic retry and cleanup.
grafos-dspSignal processing pipelines (FFT, FIR/IIR, mixer, resample) with deterministic latency from lease-based reservation.
grafos-jobsIdempotent burst compute and retry scaffolding.

Application Infrastructure

CrateRole
grafos-storeUniversal object storage convention: MemObjectStore, BlockObjectStore, TieredObjectStore with CRC32 checksumming.
grafos-store-cliCLI tool for grafOS fabric object stores (grafos-store-cli binary).
grafos-mqLease-based message queue: partitioned topics with ring-buffer storage, consumer groups, dead-letter routing.
grafos-registryFabric-wide service registry: register, discover, and watch services backed by leased fabric memory.
grafos-dashboardReal-time monitoring dashboard: topology map, utilization heatmap, lease churn, contention, alerts.
grafos-cliUnified operational CLI (grafos binary): inspect nodes, manage leases, object storage, profiling, health checks.
grafos-rpc-macrosProc macros for grafos-rpc service definitions.

Protocol Stack

┌─────────────────────────────────────────────────────┐
│ Application Layer │
│ Control ops: PING, GET_IDENTITY, GET_INVENTORY, │
│ LEASE_ALLOC, LEASE_FREE, LEASE_RENEW, LEASE_QUERY, │
│ CAP_REQUEST, CAP_REFRESH, CAP_REVOKE │
├─────────────────────────────────────────────────────┤
│ Transport Layer │
│ QUIC / TLS 1.3 (port 5701) -- control plane │
│ UDP (port 5700) -- discovery (ANNOUNCE/SOLICIT) │
│ UDP (port 5702) -- FBMU memory data plane │
│ UDP (port 5703) -- FBBU block data plane │
├─────────────────────────────────────────────────────┤
│ Security Layer │
│ Ed25519 signatures (discovery, tokens) │
│ TLS 1.3 mTLS (QUIC control) │
│ HMAC-SHA256 (FBMU/FBBU per-lease dp_key) │
│ HMAC-SHA256 capability tokens (cap-tokens feature) │
├─────────────────────────────────────────────────────┤
│ Wire Format │
│ Big-endian, TLV extensible, FRAG_V2 │
│ See docs/spec/fabricbios-wire-encoding-v0.md │
└─────────────────────────────────────────────────────┘

Port Assignments

PortProtocolPurpose
5700/UDPfabricBIOS discoveryANNOUNCE, SOLICIT, WITHDRAW
5701/QUICfabricBIOS controlLease management, capability tokens, identity
5702/UDPFBMUMemory data-plane (read/write with HMAC auth)
5703/UDPFBBUBlock data-plane (read_block/write_block with HMAC auth)

Data Plane Landscape

fabricBIOS supports multiple data planes, selected by resource type and deployment context:

Data PlaneProtocolResourceStatus
FBMUUDP 5702MemoryProduction (Pi5 bare-metal + fabricbiosd)
FBBUUDP 5703Block storageProduction (Pi5 NVMe HAT + SD + fabricbiosd)
RDMAsoft-RoCE (RXE)High-perf memoryDev/CI (Linux soft-RoCE)
NVMe-oFnvmet loopPremium block bindingDev/CI (Linux loop-backed nvmet target lifecycle; steady-state remote I/O unproven)
macvlan networkmacvlan / SR-IOV VFNetwork interfacesLinux fabricbiosd
QUIC streamsQUIC bidiControl + dataAll targets (primary transport)

grafOS applications use these data planes transparently through the library stack. grafos-std provides typed lease handles; higher-level libraries (collections, tensor, store) map operations to the appropriate data plane automatically.

Mixed-Fleet Orchestration

grafOS supports heterogeneous fabrics where bare-metal Pi5 nodes and Linux fabricbiosd nodes participate in a single resource graph:

grafOS runtime
┌───────────┴───────────┐
│ │
Pi5 bare-metal (QUIC) fabricbiosd (QUIC)
- FBMU memory - FBMU memory
- FBBU block (NVMe/SD) - FBBU block
- 2-4 GB DRAM - macvlan network
- no OS, direct HW - RDMA (soft-RoCE)
- NVMe-oF
- auto-detected resources

The mixed-fleet scenario exercises this: discovering nodes of both types, allocating resources across them, and performing data-plane I/O regardless of the underlying platform. Resource types (MEM, BLOCK, NET, CPU, GPU) are uniform across node types; only the available data planes differ.

Observability Stack

┌────────────────────────────────────────────┐
│ grafos-dashboard │
│ topology map, heatmap, lease churn, alerts │
├────────────────────────────────────────────┤
│ grafos-profile │
│ flame graphs, lease timelines, waste │
├────────────────────────────────────────────┤
│ grafos-observe │
│ metrics, events, tracing, OTLP, Prometheus │
├────────────────────────────────────────────┤
│ grafos-observe-macros │
│ #[grafos::instrument] proc macro │
└────────────────────────────────────────────┘

Every lease acquisition, data-plane operation, and rewrite plan execution is an observable event. The stack exports to standard formats (OpenTelemetry OTLP, Prometheus) and provides grafOS-specific views (resource flame graphs, lease timelines, data-flow diagrams). The dashboard provides real-time visualization with QUIC polling against live nodes.

Resource Lifecycle

A resource passes through the following stages:

Advertise Discover Lease Bind
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node │─────>│ Relay │─────>│ Client │─────>│ Data │
│ ANNOUNCE │ │ answers │ │ ALLOC │ │ Plane │
│ (signed) │ │ SOLICIT │ │ (QUIC) │ │ I/O │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
v
┌─────────┐ ┌─────────┐
│ Expire │─────>│ Fence │
│ or │ │ (if │
│ Revoke │ │ teardown│
└─────────┘ │ fails) │
└─────────┘
  1. Advertise: Node sends signed ANNOUNCE to relay (periodically, 30s default). Includes resource inventory (MEM, BLOCK, CPU, GPU, NET), locality, and health flags.

  2. Discover: Client sends SOLICIT to relay. Relay responds with aggregated ANNOUNCE payloads, optionally filtered by resource type, node ID, or locality.

  3. Lease: Client connects to node via QUIC (port 5701). Issues LEASE_ALLOC (or other resource-specific op) to create a time-bounded lease. Response includes binding TLVs (lease_id, dp_key, endpoint, limits).

  4. Bind: Client uses binding credentials to perform data-plane I/O. FBMU for memory (UDP 5702), FBBU for block storage (UDP 5703), RDMA for high-performance memory, and NVMe-oF for premium block bindings where target/session support is available.

  5. Use: Client reads/writes via the data plane. Each request carries the lease_id, a nonce (for replay protection), and an HMAC auth_tag (keyed by dp_key).

  6. Expire/Revoke: Leases have mandatory expiry. On expiry or explicit FREE, the node tears down data-plane authorization. Subsequent data-plane ops return NO_LEASE.

  7. Fence: If teardown fails (hardware fault, driver error), the resource enters FENCED state. No new leases are granted. The resource is reported as FENCED in discovery until remediated.

Trust Model

fabricBIOS uses a layered trust model that progresses from initial contact to full mutual authentication:

Trust Bootstrap

┌───────────────┐
│ TOFU │ First contact: pin server cert hash
│ (default) │ Subsequent: verify pinned hash
└───────┬───────┘
│ upgrade
v
┌───────────────┐
│ mTLS │ Client and server present certificates
│ (fbmu-auth) │ Node verifies client cert (TOFU pin)
└───────┬───────┘
│ add
v
┌───────────────┐
│ Capability │ HMAC-SHA256 tokens for resource access
│ Tokens │ Audience-bound, short TTL, attenuable
│ (cap-tokens)│
└───────────────┘
  1. TOFU (Trust On First Use): Default for QUIC connections. Client pins server certificate hash on first connection; subsequent connections verify the pin. Simple but effective for small fabrics.

  2. mTLS (Mutual TLS): Enabled via fbmu-auth feature flag. Both client and server present TLS certificates. Provides bidirectional authentication for the control plane.

  3. Capability Tokens: Enabled via cap-tokens feature flag. HMAC-SHA256 tokens minted by LEASE_ALLOC (via CAP_REQUEST). Tokens are audience-bound, have short TTL (default max 300s), and can be attenuated. Validated on every data-plane operation.

  4. FBMU/FBBU Auth Tags: Per-lease dp_key generated at allocation time. Every data-plane request carries an HMAC-SHA256 auth tag computed over the request header. Provides per-operation authentication without TLS overhead on the data plane.

Discovery Trust

  • ANNOUNCE and WITHDRAW messages are Ed25519-signed.
  • Relays can operate in pinned trust bundle mode (verify signatures against known node keys).
  • Replay protection via nonce (timestamp or random) + bounded replay cache.
  • Rate limiting for unsigned messages (default 10/sec/source).

Deployment Targets

TargetTransportStatus
Linux daemon (fabricbiosd)QUIC 5701 (default), UDP 5700Production-ready for dev/test
Raspberry Pi 5 bare-metalQUIC 5701, FBMU 5702, FBBU 57033-node fleet operational
x86_64 bare-metal (QEMU)QUIC 5701 target default, UDP 5700Bringup/validation target (not primary deployment path)
grafOS runtimeQUIC client to above targetsSim + live modes

Key Design Decisions

  • QUIC-first: QUIC 5701 is the normative default control transport. All TCP transport code has been removed.
  • Fail-closed: Unknown protocol versions, unknown flag bits, missing signatures, and ambiguous identities are all rejected.
  • Lease-mandatory: All data-plane bindings require leases with explicit lifetimes. No permanent access grants.
  • Signature before decompress: Signature verification always precedes decompression or deep parsing, preventing DoS via malformed compressed payloads.
  • Minimal TCB: Core protocol logic is no_std compatible. Platform-specific code is isolated in platform crates.
  • Mixed-fleet native: The same grafOS runtime and library stack operates uniformly across bare-metal and Linux nodes in a single graph.
  • Layered libraries: Applications build on progressively higher-level grafOS libraries (std -> collections -> store) rather than raw wire protocol, keeping application code portable across data planes.
  • docs/spec/fabricBIOS-design-document.md — Normative specification
  • docs/spec/fabricbios-wire-encoding-v0.md — Wire format reference
  • docs/grafos-design-document.md — grafOS resource graph design
  • docs/grafos/README.md — grafOS documentation index
  • docs/grafos-libraries-overview.md — Library stack layering and guides
  • docs/runbooks/getting-started.md — Quick start guide
  • docs/spec/resource-types.md — Resource type reference
  • docs/runbooks/pi5-bringup-summary.md — Pi5 bare-metal status