Recipe 35: Web Server With Leased Ingress and Replicas
The Idea
A web server on grafOS is not a host process that happens to use fabric resources. It is a native service: a set of tasklet replicas, each holding a leased listener port, leased CPU, and leased memory, placed and managed by the scheduler.
This recipe shows how to define, place, run, and fail over a replicated HTTP
echo service using native grafOS primitives — no ProgramRuntime, no POSIX
sockets, no ambient host networking.
What You Need
- A fabric with at least two nodes (sim mode works fine)
grafos-schedulerwith theschedulerfeaturegrafos-stdfor the tasklet SDK- A tasklet WASM module implementing the HTTP echo handler
Step 1: Define and Deploy the Service
A native service starts with a ServiceSpec — the full description of
what to run, how to replicate it, and what resources each instance needs.
use grafos_core::ResourceKind;use grafos_scheduler::{ NodeConstraint, Priority, ReplicationMode, ResourceRequirement, ServiceId, ServiceSpec, Strategy, TenantId,};
let spec = ServiceSpec { service_id: ServiceId { name: "echo-http".into(), tenant_id: TenantId(7), }, version: "1.0.0".into(), module_hash: echo_module_hash, wasm: echo_module_bytes, replication: ReplicationMode::ActiveActive { replica_count: 2 }, priority: Priority::Standard, strategy: Strategy::Spread, node_constraint: NodeConstraint::Any, anti_affinity_services: vec![], resources_per_instance: vec![ ResourceRequirement { resource_type: ResourceKind::Cpu, capacity: 1 }, ResourceRequirement { resource_type: ResourceKind::Mem, capacity: 64 * 1024 * 1024 }, ], listener_port: 8080, max_sessions: 64, drain_deadline_secs: 30, required_rights: 0,};
// Deploy through the orchestrator (acquires leases, places instances).let (service_id, events) = orchestrator.deploy(spec, &capacity_ledger, now)?;Core grafOS API Path
The scheduler-facing path is ServiceSpec into ServiceOrchestrator::deploy.
The orchestrator uses the service transport to acquire a CPU lease, listener
lease, memory leases, and then submit the tasklet with service capabilities.
Routing then reads the resulting ServiceTopology:
use grafos_scheduler::{RoutingPolicy, ServiceResolver};
let (service_id, _events) = orchestrator.deploy(spec, &capacity_ledger, now)?;let topology = orchestrator .get_topology(&service_id) .expect("service was just deployed");
let mut resolver = ServiceResolver::new();let endpoint = resolver.resolve(topology, &RoutingPolicy::RoundRobin)?;# let _ = endpoint;# Ok::<(), Box<dyn std::error::Error>>(())Each placed instance gets:
- A listener lease — exclusive authority over port 8080 on that node
- A CPU lease — execution capacity for the tasklet
- A memory lease — working memory for request buffers and state
Step 2: Submit the Tasklet
Each replica runs the same WASM module. The tasklet is submitted through
a CpuLease, which manages execution capacity.
use grafos_std::cpu::CpuBuilder;
// Acquire a CPU lease, then submit the tasklet module.let cpu_lease = CpuBuilder::new() .cores(1) .lease_secs(300) .acquire()?;
let result = cpu_lease.cpu() .submit(&echo_module_bytes) .fuel(1_000_000) .input(b"") .launch()?;Inside the tasklet, the handler uses service hostcalls from grafos_svc_v0
(see docs/grafos/service-abi-v0.md for the full ABI):
// Inside the WASM tasklet// svc_listen requires a cap_handle with RIGHTS_SVC_LISTEN for the port.let listener = svc_listen(cap_handle, 8080, 64)?;
loop { let session = svc_accept(listener)?; if session < 0 { continue; } // no pending connection
// svc_read / svc_write operate on session handles, not file descriptors. let mut buf = [0u8; 1024]; let n = svc_read(session, &mut buf)?;
svc_write(session, b"HTTP/1.1 200 OK\r\n\r\necho")?; svc_close(session)?;}No POSIX sockets. No bind() / accept() / read() / write(). The
service hostcalls operate on leased listener handles, not file descriptors.
Step 3: Clients Discover and Route
A client finds the service by resolving its topology, not by node address.
use grafos_scheduler::service_resolver::{ServiceResolver, RoutingPolicy};
let mut resolver = ServiceResolver::new();
// The topology comes from the orchestrator's service state.let topology = orchestrator.get_topology(&service_id).expect("service topology");
let endpoint = resolver.resolve( &topology, &RoutingPolicy::RoundRobin,)?;// endpoint.node_id, endpoint.listener_port, endpoint.instance_idThe ResolvedEndpointCache (5s TTL, generation-aware) can wrap the
resolver to avoid re-resolving on every call. See
native-service-routing-model.md section 3a for cache semantics.
Step 4: Survive a Node Failure
When a node dies, the scheduler detects the lease expiry and triggers failover:
- The dead replica’s listener lease expires → instance transitions to
Fenced. - The
ServiceOrchestratorstarts a replacement on another node. - The replacement acquires a new listener lease and starts the tasklet.
- The resolver’s cached entry is invalidated by the generation bump.
- New client requests route to the surviving replica immediately; the
replacement starts receiving traffic once it is
Active.
No DNS update. No load balancer reconfiguration. No process restart. The lease model handles it.
// Trigger failover for a failed instance.let events = orchestrator.failover( &service_id, failed_instance_id, &capacity_ledger, now,)?;// events: [FailoverStarted, InstanceProvisioned, ...]// orchestrator.tick() advances the state machine to completion.Step 5: Planned Cutover
To move the service to a new node (rolling deploy, hardware maintenance):
- Call
cutover()on the instance to be replaced. - The orchestrator provisions a replacement on a different node.
- The old instance drains: stops accepting new sessions, waits for
in-flight sessions to complete (bounded by
drain_deadline_secs). - Old listener lease is revoked after drain completes.
- Generation bumps at each step keep the resolver’s cache fresh.
// Initiate a planned cutover for a specific instance.let events = orchestrator.cutover( &service_id, old_instance_id, &capacity_ledger, now,)?;// events: [CutoverStarted, InstanceProvisioned, ...]// orchestrator.tick() drives: Drain → Revoke → Replace → CutoverCompletedWhy This Matters
In a traditional system, “deploy a web server with failover” means: write a server binary, package it, configure a process manager, set up a load balancer, write health checks, configure DNS failover, and hope the pieces agree about what “healthy” means.
In grafOS, the service is the lease graph. The listener lease is the authority to accept connections. The CPU lease is the authority to execute. The memory lease is the authority to allocate buffers. When any lease expires, the resource is fenced and the scheduler replaces it. There is no gap between “the service is running” and “the resources are leased.”
Failure Modes
| Failure | Behavior |
|---|---|
| Node dies | Listener lease expires → fenced → failover to replacement |
| Tasklet crashes | CPU lease intact; scheduler can resubmit on same node |
| Listener lease revoked | Instance transitions to Fenced; new sessions rejected |
| All replicas down | Resolver returns error; client gets explicit “no healthy replica” |
| Network partition | Lease renewal fails → expiry → fenced; partition heals → re-place |
Testing This Recipe
In sim mode:
use grafos_testkit::SimFabric;
let mut fabric = SimFabric::new(4); // 4 simulated nodes// Place service, submit tasklets, verify routing...// Partition node_a, verify failover...// Heal partition, verify re-placement...See Also
docs/grafos/native-service-model.md— service primitive definitiondocs/grafos/native-service-topology-model.md— topology/failover modeldocs/grafos/native-service-routing-model.md— discovery/routing/cachedocs/grafos/service-abi-v0.md— service hostcall ABI referencedocs/runbooks/service-routing-runbook.md— operational guidance- Recipe 7 (zero-copy microservices) — memory-transport RPC, not service placement
- Recipe 36 (stateful KV with fabric storage) — adds durable state to this pattern