Skip to content

Recipe 51: Multi-Cloud Deploy

What You Build

The same WASM tasklet runs on AWS, GCP, and Azure cells unchanged. The fabric is the abstraction; provider choice is a deploy-time flag, not a code change. This recipe walks through deploying one tasklet to all three clouds and reading back the cell context each one ran on, so you can compare cost / latency / region for the same workload.

Source

cookbook/recipe-51-multi-cloud-deploy/ in the source tree. 6 unit tests, all green.

The tasklet takes a JSON input describing where it expects to run, plus an optional echo blob, and returns the same context plus the echo length so the caller can verify the response came from the cell they targeted.

pub fn compute(input_bytes: &[u8]) -> Result<IdentifyOutput, &'static str> {
let input: IdentifyInput = serde_json::from_slice(input_bytes).map_err(|_| "invalid_input")?;
if input.provider.is_empty() { return Err("missing_provider"); }
Ok(IdentifyOutput {
ran_on: WhereItRan { provider: input.provider, cell_id: input.cell_id, region: input.region },
echo_len: input.echo.len(),
module_version: MODULE_VERSION,
})
}

Deploy to all three clouds

The tasklet WASM is byte-identical across providers. The wire path differs between standalone (AWS) and registered (GCP / Azure) cells; the program code and module hash do not.

Terminal window
grafos tasklet build
grafos validate

AWS — standalone cell (own scheduler URL + own trust dir). The scheduler URL and trust paths come from .grafos/cloud/aws-cells.json, written when you ran grafos cloud provision aws:

Terminal window
CELL=$(jq -r '.[0]' .grafos/cloud/aws-cells.json)
SCHED=$(echo "$CELL" | jq -r .scheduler_url)
TENANT=$(echo "$CELL" | jq -r .tenant_name)
CA=$(echo "$CELL" | jq -r .ca_cert)
CERT=$(echo "$CELL" | jq -r .tenant_cert)
KEY=$(echo "$CELL" | jq -r .tenant_key)
grafos deploy run --tasklet identify --mem 32768 \
--scheduler "$SCHED" --direct \
--cert "$CERT" --key "$KEY" --ca "$CA" \
--tenant "$TENANT" --json

GCP / Azure — registered cells (hosted scheduler routes). Once grafos provider init gcp (or azure) has joined cells to the hosted fleet, the CLI uses your saved Tenura credentials and you only specify --provider:

Terminal window
grafos deploy run --provider gcp --tasklet identify --mem 32768 --json
grafos deploy run --provider azure --tasklet identify --mem 32768 --json

Each command returns a run_id and a cell_id for the cell that admitted the deploy. Pull the artifacts to verify which cell actually ran the work:

Terminal window
grafos artifacts <run_id_aws>
grafos artifacts <run_id_gcp>
grafos artifacts <run_id_azure>

response.json includes the cell-context echo so you can prove the AWS deploy ran on an AWS cell, the GCP one on a GCP cell, and so on.

Two cell patterns: standalone vs registered

Worth knowing before you ship to production:

  • AWS uses the standalone-cell pattern. Each AWS cell is a self-contained mini-fabric — its own fabricbiosd + grafos-scheduler, its own scheduler URL. You bring up an AWS cell with grafos cloud provision aws; deploys go directly to that cell’s scheduler URL via mTLS. The cell does NOT register with a public scheduler.
  • GCP uses the registered-cell pattern. A GCP cell runs grafos cloud cell-agent which registers outbound with scheduler.grafos.tenura.systems. The hosted scheduler is the orchestrator; deploys go through it and the orchestrator routes to whichever GCP cell is healthy.
  • Azure is currently registered-pattern as well (same shape as GCP).

For most multi-cloud workloads this means: --provider gcp and --provider azure flow through the hosted scheduler and use your saved Tenura credentials. AWS standalone cells are addressed directly by --scheduler <url> --direct plus the trust paths recorded in aws-cells.json. The wire path differs; the program does not.

What’s interesting

  1. Same WASM, byte-identical. The module_sha256 printed by grafos tasklet build is the same regardless of where you’ll deploy it. The provider abstraction is at the scheduler layer, not the bytecode.
  2. Capability tokens are scope-bound to the issuing cell. A token from your AWS cell can’t talk to GCP and vice versa. If your program needs to span clouds, each cloud’s tasklet holds its own lease + token.
  3. Lease costs differ per cloud. grafos runs show <run_id> --json returns a cost field. AWS t4g.medium, GCP e2-medium, and Azure B-series have different per-hour rates; the same workload comes back with different cents_per_run numbers. The hosted admin stats page shows provider-level infrastructure and spend.
  4. Tenant budget is global. A single tenant’s budget cap is the sum of spend across providers. Run grafos runs ls --json | jq '[.runs[].cost.cents_per_run] | add' to see total across providers; grafos admin set-tenant-budget --tenant <name> --cents <N> to top up.

Failure Behavior

  • Invalid tasklet input returns invalid_input.
  • An empty provider field returns missing_provider.
  • A provider with no healthy cell is rejected by scheduler admission before the tasklet runs.
  • AWS direct deploys fail closed if the scheduler URL, CA, tenant cert, tenant key, or tenant name from aws-cells.json is missing or stale.
  • Budget exhaustion is global across providers; admission refuses new work until the tenant budget is updated.

Run And Verify

Terminal window
cargo test -p cookbook-recipe-51-multi-cloud-deploy
grafos tasklet build
grafos validate

Expected: the tests cover AWS, GCP, and Azure context echoes, missing provider rejection, invalid JSON rejection, and stable module version. A live deploy returns one run_id per provider, and each downloaded response.json names the provider, region, and cell that ran the work.

Production drop-in

In a real workload you’d:

  1. Decide per-environment which provider to use. Dev → local-dev. Staging → AWS. Prod → GCP. Make that choice via env var or grafos.toml’s [runnable_targets] block; the program code stays the same.
  2. Set up at least one cell on each provider you intend to deploy to. AWS via grafos cloud provision aws; GCP via grafos cloud provision gcp (registered, joins the hosted fleet); Azure via grafos cloud provision azure.
  3. Use the tenant budget gate. Tenura-hosted accounts come with a 2000-cent ($20.00) starting budget that’s enforced cluster-wide. Top it up before a multi-cloud burst.
  4. Tear down standalone cells when done to stop accruing spend: grafos cloud teardown aws --cell-id <id>.

Adapt It

Use this pattern when you need to compare providers for the same workload or keep a manual escape path available. For placement-aware applications, pair it with recipes that declare failure-domain placement directly in program policy rather than scattering provider switches through application code.

Where to next