Skip to content

Recipe 41 - Durable Async Report API

What You Build

Build a report API that accepts long-running work without losing requests during an availability-zone, region, or provider failure.

The service has three ordinary pieces:

  • POST /reports accepts a request and returns a receipt only after the command is quorum-committed.
  • GET /reports/{request_id} reads status from replicated state, with a freshness requirement tied to the receipt offset.
  • Worker processes in explicitly allowed domains complete accepted requests and record the external effect once.

This replaces the usual pile of cloud queue, status database, dedupe table, worker checkpoint, and failover script with one replicated-resource design: ordered commands, map-backed status, idempotent acceptance, idempotent effects, and explicit placement.

The compiled recipe lives in cookbook/recipe-41-durable-async-api and uses public grafos-replicated handles. There are no mocks, hidden sync loops, or provider fallbacks.

Program

use cookbook_recipe_41_durable_async_api::{
aws_zone, gcp_region, AsyncApiError, ReportApiService, ReportSubmission,
RequestState,
};
fn main() -> Result<(), AsyncApiError> {
let mut service = ReportApiService::cross_provider()?;
let receipt = service.submit_report(
aws_zone(),
ReportSubmission {
request_id: "req-2026-04-acct-1".into(),
account_id: "acct-1".into(),
month: "2026-04".into(),
},
)?;
assert_eq!(receipt.status_path, "/reports/req-2026-04-acct-1");
service.worker_tick(gcp_region(), "worker-gcp")?;
let status = service
.read_status(gcp_region(), &receipt)?
.expect("request status");
assert!(matches!(status.state, RequestState::Completed { .. }));
Ok(())
}

The important part is the receipt. It carries the committed log offset. A caller that retries against another allowed domain can ask for status at that offset instead of accepting a stale local projection.

Core grafOS API Path

The service facade is built from a replicated command log, a replicated status map, and an idempotency store:

use fabricbios_core::lease::FenceEpoch;
use grafos_replicated::{
LogicalResourceName, ReplicatedFabricLog, ReplicatedIdempotencyStore,
ReplicatedMap, SchemaId,
};
use cookbook_recipe_41_durable_async_api::{
cross_provider_profile, ApiCommand, RequestRecord,
};
let profile = cross_provider_profile();
let writer_epoch = FenceEpoch(1);
let replicas = profile.replica_policy;
let locator = profile.locator;
let commands = ReplicatedFabricLog::<ApiCommand>::new(
LogicalResourceName::new("report-requests"),
SchemaId::new("report-command.v1"),
writer_epoch,
replicas.clone(),
locator.clone(),
)?;
let requests = ReplicatedMap::<String, RequestRecord>::new(
LogicalResourceName::new("report-request-status"),
SchemaId::new("report-status.v1"),
writer_epoch,
replicas.clone(),
locator.clone(),
)?;
let effects = ReplicatedIdempotencyStore::new(
LogicalResourceName::new("report-effects"),
SchemaId::new("report-effect.v1"),
writer_epoch,
replicas,
locator,
)?;
# let _ = (commands, requests, effects);
# Ok::<(), grafos_replicated::ReplicatedError>(())

Submission reserves an idempotency key, appends an ApiCommand, writes Accepted into the status map, and completes the idempotency record with the accepted offset. Status reads use ReadConsistency::AtLeastOffset from the receipt, which is the grafOS detail the HTTP-shaped facade is meant to teach.

Service Flow

  1. Ingress checks that its failure domain is allowed by the service placement policy.
  2. The request id and report payload produce a canonical fingerprint.
  3. The idempotency store reserves the request id.
  4. The command appends to ReplicatedFabricLog<ApiCommand>.
  5. ReplicatedMap<String, RequestRecord> records Accepted at the committed offset.
  6. The idempotency record is completed with the accepted offset.
  7. The API returns a receipt with request_id, accepted_offset, duplicate, and status_path.
  8. Workers scan accepted commands, reserve an effect key, perform the work, and CAS the request status to Completed.
  9. Status reads use ReadConsistency::AtLeastOffset(receipt.accepted_offset).

Placement Variants

The same service code can run under different resilience envelopes. The recipe crate exposes these profiles so the policy choice is visible in code:

use cookbook_recipe_41_durable_async_api::{
cross_provider_profile, cross_region_profile, multi_az_profile,
single_az_profile, AsyncApiError, DurableAsyncApi,
};
fn main() -> Result<(), AsyncApiError> {
let single_az = DurableAsyncApi::from_profile(single_az_profile())?;
let multi_az = DurableAsyncApi::from_profile(multi_az_profile())?;
let cross_region = DurableAsyncApi::from_profile(cross_region_profile())?;
let cross_provider = DurableAsyncApi::from_profile(cross_provider_profile())?;
let _ = (single_az, multi_az, cross_region, cross_provider);
Ok(())
}
  • single_az_profile() is useful for local development or a deliberately narrow deployment. A worker in another AZ is refused.
  • multi_az_profile() allows movement inside one AWS region across distinct availability zones.
  • cross_region_profile() allows movement between AWS regions.
  • cross_provider_profile() allows AWS ingress and GCP workers because the program explicitly authorized both providers.

Placement is authorization, not a suggestion. A request for a domain outside the profile fails closed with DomainUnavailable; the recipe does not try another cloud behind the caller’s back.

Failure Behavior

  • Client retry after timeout: submitting the same request id and payload returns the original receipt and does not append another command.
  • Changed retry payload: submitting the same request id with a different payload fails closed with IdempotencyConflict.
  • Ingress domain unavailable: POST /reports fails with DomainUnavailable.
  • Worker in an unauthorized domain: the worker tick fails with DomainUnavailable; the scheduler is not asked to find a different provider.
  • Worker repeats completed work: no second result is produced because the status map is already Completed and the effect key is idempotent.
  • Failover status read is stale: AtLeastOffset fails until the projection has reached the receipt’s accepted offset.

Run And Verify

Run the compiled recipe:

Terminal window
cargo test -p cookbook-recipe-41-durable-async-api

The tests prove:

  • POST /reports style submission, worker completion, and status read through the ReportApiService facade;
  • accepted work can complete from another explicitly allowed cloud;
  • placement profiles change the allowed failure domains without changing service logic;
  • duplicate request receipt replay;
  • changed duplicate payload fail-closed behavior;
  • disallowed ingress fail-closed behavior;
  • completed work is not processed twice;
  • freshness reads wait for the requested committed offset.

Adapt It

Change these knobs first:

  • Placement profile: choose single-AZ, multi-AZ, cross-region, or cross-provider based on where the service is allowed to run.
  • Quorum: adjust ReplicaPolicy only when the durability/latency tradeoff is intentional.
  • Request id: use a stable client idempotency key, not a random server id, if callers retry after network timeouts.
  • Effect key: include the external system and request id so a worker retry cannot write the same report twice.
  • Freshness: use AtLeastOffset(receipt.accepted_offset) for user-visible reads after failover.

See also:

  • Recipe 39: Cross-Cloud Order Pipeline
  • Recipe 40: Replicated Session Continuation
  • crates/grafos-replicated