stage0: netboot the cloud from a signed URL

Shipping a new build to a cloud fleet usually means baking a full machine image per release and re-provisioning instances to pick it up. The image is the unit of deployment, so every change — kernel, initramfs, command line — rebuilds and redistributes the whole thing.

stage0 inverts that. It's a tiny, db-signed UEFI loader you bake into the image once. On boot it reads a URL from the instance's metadata, downloads the real payload over HTTP, verifies it against a pinned key or hash, measures it into the TPM, and chain-loads it. To ship a new build you publish a signed binary to that URL. The next boot picks it up. Nothing else changes — not the image, not the instance configuration.

What it is

stage0 is a pure-UEFI application — no Linux kernel underneath it, no C runtime. It's #![no_std] Rust with alloc, built for x86_64-unknown-uefi and aarch64-unknown-uefi, and it compiles to about 153 KB on aarch64 and 178 KB on x86_64. The dependency tree is all no-std Rust: the uefi crate for boot services, sha2 and ed25519-compact for verification, serde_json for the config, and vaportpm-attest for talking to the TPM. No OpenSSL, no libc.

The whole job is: get on the network, find out what to boot, fetch it, prove it's allowed, record what it was, and hand off.

The boot flow

From power-on to handoff:

Network up — bring up the NIC over EFI_IP4_CONFIG2 with DHCP.
Read config — fetch the _stage0 user-data document from the cloud metadata service (EC2 IMDSv2, GCP, Azure, or Alibaba Cloud).
Download payload — pull the per-architecture binary from the URL in the config, over raw TCP.
Verify — admit the payload only if it matches the pinned SHA-256 hash or a detached ed25519 signature.
Measure — extend TPM PCR 14 with SHA-256(payload).
Chain-load — LoadImage + StartImage, and stage0 is done.

Configuration lives in instance metadata

There's no config file baked into stage0 and no command line. Everything comes from a _stage0 key in the instance's user-data, which stage0 fetches from whichever cloud metadata service answers:

{
  "_stage0": {
    "args": ["optional", "load-options"],
    "x86_64":  { "url": "http://release.example/payload-x86_64.efi",  "ed25519": "<base64 pubkey>" },
    "aarch64": { "url": "http://release.example/payload-aarch64.efi", "ed25519": "<base64 pubkey>" }
  }
}

Each architecture entry carries a URL and exactly one of two admission policies. That choice is the whole story for deployments.

Two ways to say "yes" to a payload

Pinned hash (sha256). The config names an exact 64-hex-character digest. stage0 boots that payload and nothing else. The binary is immutable; changing what boots means editing the instance metadata. This is the right mode when you want a fleet pinned to one known artifact.

Signed release channel (ed25519). The config names a long-term release public key. stage0 downloads the payload, fetches a detached signature from <url>.sig (64 raw bytes), and verifies it with ed25519-compact. Any payload signed by the matching private key is admitted.

The signed mode is what makes "configure once" real. You set the URL and the release public key in the instance metadata a single time. After that, publishing a new build is: sign it offline with the release key, upload the binary and its .sig to the channel URL. Every machine pointed at that channel rolls forward to the new build on its next boot, with no metadata edits and no re-imaging. The private key never touches a deployed machine — it stays offline with whoever cuts releases.

What gets measured (and what doesn't)

stage0 extends exactly one PCR: PCR 14, with the hash of the payload binary. The config, the URL, the pinned hash, the release key, the signature — none of it is measured.

That's deliberate. The hash or signature is admission control: it decides whether stage0 is willing to load a payload, not what a verifier later attests. By keeping the measured surface down to "this exact payload booted," a remote verifier can check PCR 14 against a list of approved release hashes without having to model your metadata, your channel layout, or your signing scheme. The thing you prove stays small even though the thing you configure can be flexible.

stage0 itself is db-signed and measured by the firmware before it ever runs, so the chain stays attestable end to end: firmware verifies stage0, stage0 verifies and measures the payload, the payload's hash lands in PCR 14. (For the broader trust model these measurements feed into, see the earlier note on pragmatic cloud trust with vTPMs.)

Chain-loading past the firmware's own checks

The payload is loaded from an in-memory buffer with LoadImage/StartImage. There's a subtlety: the firmware's Secure Boot logic would normally re-verify that buffer against db, but the payload is gated by stage0's own policy, not the firmware database. So stage0 temporarily installs an allow-all authentication verdict around the single LoadImage call and restores the original hooks immediately after — the same security-architecture override technique shim uses.

This is what lets a deployment sign stage0 with a per-release ephemeral db key that's destroyed right after the build, and still chain-load late-bound payloads that the now-nonexistent key never signed. The firmware trusts stage0; stage0 makes the call on everything after it.

The payload

stage0 will boot any UEFI application binary the config points at — it doesn't care what's inside. In a lockboot deployment that binary is the UKI: a single PE image bundling the stub, kernel, initramfs, and command line. So the practical shape is "pull, verify, measure and boot a signed UKI over HTTP" — but the loader is general, and the test payload in the repo is just a trivial UEFI app that prints and reads back PCR 14 to confirm the measurement happened.

Why it's worth the trouble

The operational payoff is that the slow, heavy unit of deployment — the machine image — stops being where your releases live. You provision a fleet once with a small signed loader and a channel URL. Iterating on the actual workload becomes "publish a signed binary," and updating the fleet becomes "reboot." The boot path stays measured and attestable the whole time, so faster deploys don't cost you the ability to prove what's running.

The code is on GitHub under crates/stage0.