Using GCP Cloud KMS for Vault Auto-Unseal

If you run HashiCorp Vault in production, you've already met the unseal problem. Every time Vault starts — restart, crash, node replacement — it comes up sealed. Sealed means nothing works. Secrets are inaccessible, applications can't authenticate, and everything downstream that depends on Vault is broken. By default, unsealing requires N of M Shamir key holders to each run vault operator unseal with their share of the master key.

That works fine in a controlled environment during business hours. It does not work when your Vault pod gets evicted at 3am and your on-call engineer is asleep. We hit this at a healthcare tech company I was working with in 2018, mid-migration from Rackspace to GCP. We needed Vault to come back up on its own.

The solution is auto-unseal with GCP Cloud KMS.

How Auto-Unseal Works

Vault's master key is used to decrypt the encryption keys that protect all stored secrets. Normally the master key itself is protected by Shamir splitting — the key shares are held by humans, never stored anywhere. Auto-unseal replaces Shamir with a cloud KMS key. On startup, Vault calls KMS with its wrapped master key, KMS decrypts it, and Vault unseals automatically. The plaintext master key never leaves Google's infrastructure — it's unwrapped inside KMS and returned directly to Vault's memory.

This shifts the security question from "who holds the key shares" to "who can call this KMS key." The answer is controlled by GCP IAM, which is auditable, revocable, and doesn't require waking anyone up.

Terraform for the KMS Resources

Start with the key ring and crypto key:

resource "google_kms_key_ring" "vault" {
  name     = "${var.name}-vault-key-ring"
  location = var.region
}

resource "google_kms_crypto_key" "vault" {
  name     = "${var.name}-vault-key"
  key_ring = google_kms_key_ring.vault.id

  lifecycle {
    prevent_destroy = true
  }
}

That prevent_destroy = true lifecycle rule is not optional. Read this carefully: if you delete the KMS key that Vault used to wrap its master key, Vault is sealed permanently. There is no recovery. The encrypted data in Vault's storage backend is unrecoverable without that key. GCP won't help you. HashiCorp won't help you. It's gone.

Terraform will refuse to destroy this resource if prevent_destroy = true is set. That's exactly what you want. The destroy protection adds a small friction cost the one time you're legitimately cleaning up; it prevents catastrophic accidental destruction every other time.

Key rings cannot be deleted in GCP — only the keys inside them can. Regardless, keep prevent_destroy on the key itself.

IAM for the Vault Service Account

Create a dedicated service account for Vault:

resource "google_service_account" "vault" {
  account_id   = "${var.name}-vault"
  display_name = "Vault Server"
  description  = "Service account for HashiCorp Vault auto-unseal via Cloud KMS"
}

resource "google_kms_crypto_key_iam_member" "vault_unseal" {
  crypto_key_id = google_kms_crypto_key.vault.id
  role          = "roles/cloudkms.cryptoKeyEncrypterDecrypter"
  member        = "serviceAccount:${google_service_account.vault.email}"
}

Scope the IAM binding to the specific crypto key, not the key ring, not the project. This SA needs exactly one permission on exactly one resource. roles/cloudkms.cryptoKeyEncrypterDecrypter on the Vault crypto key is the minimum required for auto-unseal. Nothing broader.

Vault Configuration

In your Vault config file, replace or supplement your existing seal block:

seal "gcpckms" {
  project     = "my-project"
  region      = "us-central1"
  key_ring    = "my-vault-key-ring"
  crypto_key  = "my-vault-key"
}

If Vault is running on GKE with Workload Identity configured, you don't need a key file. (For how the Vault Agent sidecar integrates with Kubernetes workloads once Vault is running, see Vault Agent Sidecar on Kubernetes.) The node pool or pod's Workload Identity binding handles authentication to GCP APIs automatically. This is the right setup — no JSON key file to rotate, store, or accidentally commit.

If you're not on GKE, set GOOGLE_APPLICATION_CREDENTIALS to the path of a downloaded service account key, or use the metadata server if running on GCE.

Migrating from Shamir to Auto-Unseal

If you're adding auto-unseal to an existing Vault installation — not a fresh one — you cannot just update the config and restart. You need to migrate the master key from Shamir to KMS wrapping.

Vault provides vault operator migrate for this. The process:

Update the Vault config to add the seal "gcpckms" block but keep the existing storage config.
Run vault operator migrate — this re-wraps the master key using the KMS key without touching the stored data.
Restart Vault. It should come up unsealed.

Do this with your Shamir key holders available and a tested rollback plan. Keep the Shamir unseal keys somewhere safe until you have confirmed — multiple restarts confirmed — that auto-unseal is working. The vault operator migrate process is well-documented and reversible, but that's not a reason to do it casually.

What Auto-Unseal Does Not Replace

Auto-unseal eliminates the startup dependency on human key holders. It does not eliminate recovery keys.

During initial Vault setup (or after migration), Vault generates recovery keys. These replace the Shamir unseal keys in function: they're used if KMS becomes unavailable, if you need to perform certain admin operations, or if you need to recover from a scenario where the auto-unseal mechanism itself is broken. Store them with the same rigor you'd apply to Shamir keys — split across key holders, not in the same system Vault is protecting.

GCP Cloud KMS has very high availability, but "very high" is not "infinite." KMS outages are rare and short, but they do happen. If KMS is unavailable and Vault needs to restart, you need recovery keys to get back in. Plan for that.

The Operational Win

Before auto-unseal, a Vault restart at any hour required human intervention. After, it's a non-event. Vault pods can be evicted, nodes can be replaced, deployments can roll — Vault comes back up unsealed, services reconnect, nothing pages. The operational overhead of running Vault in production drops considerably.

The security tradeoff is real but manageable: you've moved from "humans hold key shares" to "a GCP service account holds the unsealing capability." Model that threat accordingly, audit access to the KMS key, and rotate the service account key (or use Workload Identity so there's nothing to rotate). It's a reasonable exchange for production reliability.