Recipe 51: Multi-Cloud Deploy
What You Build
The same WASM tasklet runs on AWS, GCP, and Azure cells unchanged. The fabric is the abstraction; provider choice is a deploy-time flag, not a code change. This recipe walks through deploying one tasklet to all three clouds and reading back the cell context each one ran on, so you can compare cost / latency / region for the same workload.
Source
cookbook/recipe-51-multi-cloud-deploy/ in the source tree. 6 unit tests, all green.
The tasklet takes a JSON input describing where it expects to run, plus an optional echo blob, and returns the same context plus the echo length so the caller can verify the response came from the cell they targeted.
pub fn compute(input_bytes: &[u8]) -> Result<IdentifyOutput, &'static str> { let input: IdentifyInput = serde_json::from_slice(input_bytes).map_err(|_| "invalid_input")?; if input.provider.is_empty() { return Err("missing_provider"); } Ok(IdentifyOutput { ran_on: WhereItRan { provider: input.provider, cell_id: input.cell_id, region: input.region }, echo_len: input.echo.len(), module_version: MODULE_VERSION, })}Deploy to all three clouds
The tasklet WASM is byte-identical across providers. The wire path differs between standalone (AWS) and registered (GCP / Azure) cells; the program code and module hash do not.
grafos tasklet buildgrafos validateAWS — standalone cell (own scheduler URL + own trust dir).
The scheduler URL and trust paths come from .grafos/cloud/aws-cells.json,
written when you ran grafos cloud provision aws:
CELL=$(jq -r '.[0]' .grafos/cloud/aws-cells.json)SCHED=$(echo "$CELL" | jq -r .scheduler_url)TENANT=$(echo "$CELL" | jq -r .tenant_name)CA=$(echo "$CELL" | jq -r .ca_cert)CERT=$(echo "$CELL" | jq -r .tenant_cert)KEY=$(echo "$CELL" | jq -r .tenant_key)
grafos deploy run --tasklet identify --mem 32768 \ --scheduler "$SCHED" --direct \ --cert "$CERT" --key "$KEY" --ca "$CA" \ --tenant "$TENANT" --jsonGCP / Azure — registered cells (hosted scheduler routes).
Once grafos provider init gcp (or azure) has joined cells to the hosted fleet, the CLI uses your saved Tenura credentials and you only specify --provider:
grafos deploy run --provider gcp --tasklet identify --mem 32768 --jsongrafos deploy run --provider azure --tasklet identify --mem 32768 --jsonEach command returns a run_id and a cell_id for the cell that admitted the deploy. Pull the artifacts to verify which cell actually ran the work:
grafos artifacts <run_id_aws>grafos artifacts <run_id_gcp>grafos artifacts <run_id_azure>response.json includes the cell-context echo so you can prove the AWS deploy ran on an AWS cell, the GCP one on a GCP cell, and so on.
Two cell patterns: standalone vs registered
Worth knowing before you ship to production:
- AWS uses the standalone-cell pattern. Each AWS cell is a self-contained mini-fabric — its own
fabricbiosd+grafos-scheduler, its own scheduler URL. You bring up an AWS cell withgrafos cloud provision aws; deploys go directly to that cell’s scheduler URL via mTLS. The cell does NOT register with a public scheduler. - GCP uses the registered-cell pattern. A GCP cell runs
grafos cloud cell-agentwhich registers outbound withscheduler.grafos.tenura.systems. The hosted scheduler is the orchestrator; deploys go through it and the orchestrator routes to whichever GCP cell is healthy. - Azure is currently registered-pattern as well (same shape as GCP).
For most multi-cloud workloads this means: --provider gcp and --provider azure flow through the hosted scheduler and use your saved Tenura credentials. AWS standalone cells are addressed directly by --scheduler <url> --direct plus the trust paths recorded in aws-cells.json. The wire path differs; the program does not.
What’s interesting
- Same WASM, byte-identical. The
module_sha256printed bygrafos tasklet buildis the same regardless of where you’ll deploy it. The provider abstraction is at the scheduler layer, not the bytecode. - Capability tokens are scope-bound to the issuing cell. A token from your AWS cell can’t talk to GCP and vice versa. If your program needs to span clouds, each cloud’s tasklet holds its own lease + token.
- Lease costs differ per cloud.
grafos runs show <run_id> --jsonreturns acostfield. AWS t4g.medium, GCP e2-medium, and Azure B-series have different per-hour rates; the same workload comes back with differentcents_per_runnumbers. The hosted admin stats page shows provider-level infrastructure and spend. - Tenant budget is global. A single tenant’s budget cap is the sum of spend across providers. Run
grafos runs ls --json | jq '[.runs[].cost.cents_per_run] | add'to see total across providers;grafos admin set-tenant-budget --tenant <name> --cents <N>to top up.
Failure Behavior
- Invalid tasklet input returns
invalid_input. - An empty provider field returns
missing_provider. - A provider with no healthy cell is rejected by scheduler admission before the tasklet runs.
- AWS direct deploys fail closed if the scheduler URL, CA, tenant cert, tenant key, or tenant name from
aws-cells.jsonis missing or stale. - Budget exhaustion is global across providers; admission refuses new work until the tenant budget is updated.
Run And Verify
cargo test -p cookbook-recipe-51-multi-cloud-deploygrafos tasklet buildgrafos validateExpected: the tests cover AWS, GCP, and Azure context echoes, missing provider rejection, invalid JSON rejection, and stable module version. A live deploy returns one run_id per provider, and each downloaded response.json names the provider, region, and cell that ran the work.
Production drop-in
In a real workload you’d:
- Decide per-environment which provider to use. Dev → local-dev. Staging → AWS. Prod → GCP. Make that choice via env var or
grafos.toml’s[runnable_targets]block; the program code stays the same. - Set up at least one cell on each provider you intend to deploy to. AWS via
grafos cloud provision aws; GCP viagrafos cloud provision gcp(registered, joins the hosted fleet); Azure viagrafos cloud provision azure. - Use the tenant budget gate. Tenura-hosted accounts come with a 2000-cent ($20.00) starting budget that’s enforced cluster-wide. Top it up before a multi-cloud burst.
- Tear down standalone cells when done to stop accruing spend:
grafos cloud teardown aws --cell-id <id>.
Adapt It
Use this pattern when you need to compare providers for the same workload or keep a manual escape path available. For placement-aware applications, pair it with recipes that declare failure-domain placement directly in program policy rather than scattering provider switches through application code.
Where to next
cookbook/recipe-50-local-dev— when you don’t want to touch a cloud at all.- Concepts → providers and cells — the cell + identity protocol the multi-cloud routing rests on.
docs/runbooks/tenant-budget-lifecycle— how the cap interacts with multi-cloud spend.