Skip to content

Recipe 39 - Cross-Cloud Order Pipeline

Problem: A checkout pipeline needs accepted work to survive an availability-zone, region, or provider failure without rebuilding the usual stack of cloud-specific queues, dedupe tables, DNS failover, and stale-writer guards.

Solution: Model the pipeline as replicated resources:

  • a replicated queue for order delivery;
  • map-backed consumer cursors and acknowledgments;
  • an idempotency store for webhook acceptance and order effects;
  • a checkpoint pointer backed by replicated object storage;
  • a lease for singleton maintenance work;
  • failure-domain observations that prove whether cross-provider processing was explicitly allowed.

The compiled recipe lives in cookbook/recipe-39-cross-cloud-order-pipeline. It uses public grafos-replicated resource handles. There are no mocks, provider fallbacks, or prose-only APIs.

use cookbook_recipe_39_cross_cloud_order_pipeline::{
gcp_region, CrossCloudOrderPipeline, Order,
};
let mut pipeline = CrossCloudOrderPipeline::new()?;
pipeline.receive_webhook(Order {
id: "order-1".into(),
cents: 4200,
})?;
let processed = pipeline
.process_one_from_domain(gcp_region())?
.expect("order is available to the explicitly allowed GCP worker group");
assert_eq!(processed.order.id, "order-1");
# Ok::<(), grafos_replicated::ReplicatedError>(())

Core grafOS API Path

CrossCloudOrderPipeline::new() is only a small packaging layer. The recipe constructs the replicated resources directly from public grafos-replicated handles:

use fabricbios_core::lease::FenceEpoch;
use grafos_replicated::{
ConsumerGroupName, DeliverySemantics, LogicalResourceName, PolicyHash,
ReplicaHealth, ReplicaId, ReplicaLocator, ReplicaRole, ReplicaSetLocator,
ReplicatedCheckpoint, ReplicatedFabricQueue, ReplicatedIdempotencyStore,
ReplicatedLease, ResourceGeneration, SchemaId,
};
use cookbook_recipe_39_cross_cloud_order_pipeline::{
aws_zone, cross_cloud_replica_policy, cross_cloud_worker_placement,
gcp_region, Order,
};
let generation = ResourceGeneration(1);
let replicas = cross_cloud_replica_policy();
let locator = ReplicaSetLocator::new(
generation,
vec![
ReplicaLocator {
replica_id: ReplicaId::new("orders-aws-a"),
domain: aws_zone(),
role: ReplicaRole::Voter,
health: ReplicaHealth::Healthy,
epoch: FenceEpoch(1),
content_generation: generation.0,
},
ReplicaLocator {
replica_id: ReplicaId::new("orders-gcp-central"),
domain: gcp_region(),
role: ReplicaRole::Voter,
health: ReplicaHealth::Healthy,
epoch: FenceEpoch(1),
content_generation: generation.0,
},
],
);
let mut orders = ReplicatedFabricQueue::<Order>::new(
LogicalResourceName::new("orders"),
SchemaId::new("order.v1"),
FenceEpoch(1),
replicas.clone(),
locator.clone(),
)?;
orders.create_consumer_group(
ConsumerGroupName::new("checkout-workers"),
cross_cloud_worker_placement(),
DeliverySemantics::EffectivelyOnceWithIdempotency,
FenceEpoch(1),
)?;
let effects = ReplicatedIdempotencyStore::new(
LogicalResourceName::new("order-effects"),
SchemaId::new("order-effect.v1"),
FenceEpoch(1),
replicas.clone(),
locator.clone(),
)?;
let checkpoints = ReplicatedCheckpoint::new(
LogicalResourceName::new("order-checkpoints"),
SchemaId::new("order-checkpoint.v1"),
FenceEpoch(1),
replicas.clone(),
locator.clone(),
PolicyHash([9; 32]),
)?;
let leaders = ReplicatedLease::new(
LogicalResourceName::new("order-leaders"),
SchemaId::new("order-leader.v1"),
FenceEpoch(1),
replicas,
locator,
)?;
# let _ = (orders, effects, checkpoints, leaders);
# Ok::<(), grafos_replicated::ReplicatedError>(())

The rest of the recipe calls reserve, queue append/poll/ack, checkpoint save/restore, and lease acquire/renew/release on those handles. The wrapper keeps the example readable; it does not hide a second API or provider fallback.

What The Recipe Proves

The crate exercises the user-facing resilience contract with real replicated resource primitives:

  • duplicate webhooks replay the accepted offset instead of appending a second queue record;
  • workers can process from AWS or GCP only because the consumer placement policy allowed both providers;
  • an AWS-only worker placement fails closed when asked to poll from GCP;
  • checkpoint restore reads the latest CAS pointer and object-backed bytes;
  • leader election uses fence epochs and blocks a second holder until expiry;
  • an AWS provider-loss drill updates resource observations while the explicitly allowed GCP worker path remains usable.

Runtime Shape

  1. The webhook handler reserves an idempotency key.
  2. The order is appended to the replicated queue.
  3. The accepted queue offset is recorded in the idempotency store before the handler returns success.
  4. Consumers poll through a placement policy. No provider is tried unless the policy authorized that failure domain.
  5. Processing completion records an effect and acknowledges the delivery.
  6. Checkpoints publish immutable object bytes first, then CAS-update the latest checkpoint pointer.
  7. Singleton maintenance uses a replicated lease, so stale holders fail by fence epoch.
  8. Failure drills commit signed health observations and assert resource events.

Replicated queues use quorum write acknowledgement by default. This recipe relies on that durable default; local-fast-path acknowledgement is an explicit policy choice for workloads that can tolerate weaker replay semantics.

Tests

Run it with:

Terminal window
cargo test -p cookbook-recipe-39-cross-cloud-order-pipeline

The tests cover:

  • cross-cloud order processing from an explicitly allowed GCP domain;
  • duplicate webhook replay;
  • no hidden cross-provider fallback;
  • checkpoint restore;
  • lease-based leader election and fencing;
  • provider-failure observation with continued allowed consumer placement;
  • real SHA-256 content hashes.

See also:

  • crates/grafos-replicated