Policy-as-Code with OPA and Rego

Policy is one of those things that accumulates like technical debt. You start with a note in the wiki: "all Deployments need resource limits." Then someone's container OOMs a node and takes down unrelated workloads, and now the note is a strongly worded Slack message. Then it's a checklist in your PR template. Then it's a comment in a Terraform module. Eventually it's scattered across five different places and enforced by none of them.

Open Policy Agent (OPA) is the answer I actually use to this problem.

What OPA Is

OPA is a general-purpose policy engine. You write policies in Rego (a declarative query language), and OPA evaluates them against arbitrary JSON data. That's it. The power comes from the fact that Kubernetes resource manifests are JSON, Terraform plans are JSON, HTTP request/response bodies are JSON. OPA doesn't care — it just evaluates your policies against whatever data you feed it.

There are two main integration points I use: Conftest for CI-time checks against files, and Gatekeeper for in-cluster enforcement.

Updated March 2026: OPA v1.0 shipped in early 2025 with significant changes to Rego itself — what's now called "Rego v1." The biggest visible change is that future.keywords imports are no longer needed (they were already required in newer v0.x versions), import rego.v1 is the standard way to opt in, and some syntax that was soft-deprecated in v0.x is now removed. The policy examples in this post use the v0.x style current as of early 2021. I've noted where Rego v1 differs.

Rego Basics

Rego is not like any language you've used before. It's closer to Datalog than to Go or Python. Rules evaluate to true or false (or a value), and you build up complex policies from composable rules. The mental model shift: you're not writing imperative logic, you're writing logical statements that must all be true.

A simple rule:

package kubernetes.deployment

# A rule that produces a denial message
deny[msg] {
    input.kind == "Deployment"
    container := input.spec.template.spec.containers[_]
    not container.resources.limits.cpu
    msg := sprintf("container '%s' has no CPU limit", [container.name])
}

Breaking this down: deny[msg] defines a set called deny. The rule body is a conjunction — all conditions must be true for the rule to fire. input is the document being evaluated (the Deployment manifest). containers[_] iterates over all containers. If any container is missing a CPU limit, a message is added to the deny set. If deny is non-empty, the policy fails.

Sets and Comprehensions

# Collect all container names missing memory limits
missing_memory_limits := {name |
    container := input.spec.template.spec.containers[_]
    not container.resources.limits.memory
    name := container.name
}

This is a set comprehension — it builds the set of container names where the condition holds. I use this pattern constantly to collect violations before summarizing them.

Conftest: Policy in CI

Conftest is a CLI tool that runs OPA policies against config files. I use it to check Kubernetes manifests and Terraform plans before they ever reach the cluster.

# Test a manifest
conftest test deployment.yaml --policy ./policies/

# Test a Terraform plan
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test tfplan.json --policy ./policies/terraform/

The policy structure I use:

policies/
  kubernetes/
    deployments.rego
    deployments_test.rego
  terraform/
    aws_rds.rego
    aws_rds_test.rego

A Real Policy: Resource Limits Required

package kubernetes.deployment

import future.keywords.in

deny[msg] {
    input.kind == "Deployment"
    container := input.spec.template.spec.containers[_]
    missing := missing_limits(container)
    count(missing) > 0
    msg := sprintf(
        "container '%s' is missing resource limits: %v",
        [container.name, missing],
    )
}

missing_limits(container) := limits {
    required := {"cpu", "memory"}
    have := {k | _ := container.resources.limits[k]}
    limits := required - have
}

Updated March 2026: In Rego v1, replace import future.keywords.in with import rego.v1. The rest of this policy is valid Rego v1.

Writing Rego Tests

This is non-negotiable. Rego is dense enough that untested policies will surprise you.

package kubernetes.deployment_test

import future.keywords.in

test_deny_missing_cpu_limit {
    result := deny with input as {
        "kind": "Deployment",
        "spec": {
            "template": {
                "spec": {
                    "containers": [{
                        "name": "app",
                        "resources": {
                            "limits": {
                                "memory": "128Mi"
                            }
                        }
                    }]
                }
            }
        }
    }
    count(result) == 1
    some msg in result
    contains(msg, "cpu")
}

test_allow_all_limits_set {
    result := deny with input as {
        "kind": "Deployment",
        "spec": {
            "template": {
                "spec": {
                    "containers": [{
                        "name": "app",
                        "resources": {
                            "limits": {
                                "cpu": "500m",
                                "memory": "128Mi"
                            }
                        }
                    }]
                }
            }
        }
    }
    count(result) == 0
}

Run with: opa test policies/ -v

Gatekeeper: In-Cluster Enforcement

Conftest catches problems at CI time. Gatekeeper is the in-cluster backstop. It runs as an admission webhook and evaluates OPA policies against every resource creation and update.

Gatekeeper uses its own CRD-based interface: ConstraintTemplate (the policy definition) and Constraint (an instance of that policy with configuration).

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: requireresourcelimits
spec:
  crd:
    spec:
      names:
        kind: RequireResourceLimits
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package requireresourcelimits

        violation[{"msg": msg}] {
            container := input.review.object.spec.template.spec.containers[_]
            not container.resources.limits.cpu
            msg := sprintf("container '%s' has no CPU limit", [container.name])
        }

        violation[{"msg": msg}] {
            container := input.review.object.spec.template.spec.containers[_]
            not container.resources.limits.memory
            msg := sprintf("container '%s' has no memory limit", [container.name])
        }

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RequireResourceLimits
metadata:
  name: require-resource-limits
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment"]

Why Not Just Use Admission Webhooks?

I get this question. The answer is: Gatekeeper is OPA-based admission webhooks, but with a declarative configuration layer on top and a shared library of community constraints (github.com/open-policy-agent/gatekeeper-library). Rolling your own webhook for every policy rule doesn't scale. Gatekeeper centralizes that.

The other reason: Conftest and Gatekeeper use the same Rego policies. Write the policy once, test it with Conftest in CI, enforce it with Gatekeeper in the cluster. Consistency across the pipeline is the whole point.

Policy enforcement only works if it's actually enforced — automatically, consistently, at the point of change. OPA and its tooling make that real rather than aspirational.