Writing Your First Kubernetes Operator

Kubernetes is good at running stateless applications. Rolling update a Deployment, scale it up, scale it down — the control plane handles all of that. But stateful applications have operational knowledge that can't be encoded in a Deployment. How do you safely add a node to a Cassandra ring? How do you run a backup without corrupting a live database? You need something that understands your application's domain, not just "keep N replicas running."

That's what operators are. An operator is a controller that manages a custom resource — one you define — and encodes application-specific operational logic in the reconcile loop. The concept was introduced by CoreOS in 2016, and right now in late 2017 it's still early enough that most teams haven't written one. But the pattern is solid, the tooling is improving, and if you have a stateful workload running in Kubernetes it's worth understanding.

The Pattern: CRD + Controller

Two pieces:

Custom Resource Definition (CRD). A CRD extends the Kubernetes API with a new resource type. Once you apply a CRD, you can kubectl apply resources of that type the same as any built-in resource. The API server validates and stores them in etcd.

Controller. A controller watches the Kubernetes API for resources it cares about and reconciles current state toward desired state. Every built-in Kubernetes controller (Deployment controller, StatefulSet controller, etc.) follows this model. Your operator is a controller that watches your custom resource.

The reconcile loop is straightforward conceptually:

Watch for create/update/delete events on your custom resource.
Read the current state of the world (what's actually running in the cluster).
Compare it to the desired state (what the resource spec says).
Take actions to close the gap.
Update the resource's status subresource to reflect current state.
Repeat.

The Tooling: kubebuilder and operator-sdk

Two main options at the end of 2017:

operator-sdk (from CoreOS) — opinionated scaffolding, builds on controller-runtime, reasonably good documentation. I'd start here for most cases.

kubebuilder (from the Kubernetes sig-api-machinery group) — lower-level, more control over the generated code. Useful if you need fine-grained control over API machinery.

Both are Go-based. I'm going to walk through the operator-sdk approach because the scaffolding does the most work for you upfront.

$ brew install operator-sdk
$ operator-sdk version
operator-sdk version: 0.1.0

A Simple Example: A Backup Operator

Let's write an operator that manages scheduled backups. The custom resource looks like this:

apiVersion: ops.cloudista.io/v1alpha1
kind: Backup
metadata:
  name: mydb-nightly
  namespace: production
spec:
  schedule: "0 2 * * *"
  target:
    kind: StatefulSet
    name: mydb
  destination:
    s3Bucket: mydb-backups
    s3Prefix: nightly/
  retentionDays: 30

When this resource exists in the cluster, the operator ensures a CronJob is running that takes a backup on the specified schedule and uploads it to S3. When the Backup resource is deleted, the CronJob gets cleaned up. When the schedule changes, the CronJob gets updated. The operator owns the lifecycle.

Scaffolding the Project

$ operator-sdk new backup-operator --api-version=ops.cloudista.io/v1alpha1 --kind=Backup
$ cd backup-operator
$ ls
cmd/  deploy/  pkg/  vendor/  Gopkg.toml  Makefile

The scaffold generates:
- pkg/apis/ops/v1alpha1/backup_types.go — define your CRD spec and status structs here
- pkg/controller/backup/backup_controller.go — the reconcile loop goes here
- deploy/crds/ — generated CRD YAML

Defining the CRD

Edit backup_types.go to define your spec and status types:

type BackupSpec struct {
    Schedule      string       `json:"schedule"`
    Target        BackupTarget `json:"target"`
    Destination   BackupDest   `json:"destination"`
    RetentionDays int32        `json:"retentionDays,omitempty"`
}

type BackupTarget struct {
    Kind string `json:"kind"`
    Name string `json:"name"`
}

type BackupDest struct {
    S3Bucket string `json:"s3Bucket"`
    S3Prefix string `json:"s3Prefix,omitempty"`
}

type BackupStatus struct {
    LastBackupTime   *metav1.Time `json:"lastBackupTime,omitempty"`
    LastBackupStatus string       `json:"lastBackupStatus,omitempty"`
    CronJobName      string       `json:"cronJobName,omitempty"`
}

Run the code generator to produce the CRD YAML:

$ operator-sdk generate k8s

This gives you deploy/crds/ops_v1alpha1_backup_crd.yaml:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: backups.ops.cloudista.io
spec:
  group: ops.cloudista.io
  names:
    kind: Backup
    listKind: BackupList
    plural: backups
    singular: backup
  scope: Namespaced
  version: v1alpha1

Apply this to your cluster before you try to create any Backup resources:

$ kubectl apply -f deploy/crds/ops_v1alpha1_backup_crd.yaml
customresourcedefinition.apiextensions.k8s.io "backups.ops.cloudista.io" created

The Reconcile Loop

The scaffold generates a Reconcile function in backup_controller.go. Here's the shape of what you'd fill in:

func (r *ReconcileBackup) Reconcile(request reconcile.Request) (reconcile.Result, error) {
    // Fetch the Backup resource
    backup := &opsv1alpha1.Backup{}
    err := r.client.Get(context.TODO(), request.NamespacedName, backup)
    if err != nil {
        if errors.IsNotFound(err) {
            // Resource was deleted — nothing to do, CronJob has an owner reference
            return reconcile.Result{}, nil
        }
        return reconcile.Result{}, err
    }

    // Check if a CronJob already exists for this Backup
    cronJob := &batchv1beta1.CronJob{}
    err = r.client.Get(context.TODO(), types.NamespacedName{
        Name:      backup.Name + "-backup",
        Namespace: backup.Namespace,
    }, cronJob)

    if errors.IsNotFound(err) {
        // CronJob doesn't exist — create it
        newCronJob := r.cronJobForBackup(backup)
        err = r.client.Create(context.TODO(), newCronJob)
        if err != nil {
            return reconcile.Result{}, err
        }
        backup.Status.CronJobName = newCronJob.Name
        r.client.Status().Update(context.TODO(), backup)
        return reconcile.Result{}, nil
    }

    // CronJob exists — check for drift and reconcile
    if cronJob.Spec.Schedule != backup.Spec.Schedule {
        cronJob.Spec.Schedule = backup.Spec.Schedule
        err = r.client.Update(context.TODO(), cronJob)
        if err != nil {
            return reconcile.Result{}, err
        }
    }

    return reconcile.Result{}, nil
}

The critical design principle: reconcile loops must be idempotent. You're not handling "backup was created" events — you're observing current state, comparing to desired state, and closing the gap. If this function runs twice with nothing changed in between, it should be a no-op. Design for that from the start.

Running It Locally

You don't need to build and push a container image to test the operator. make run runs the controller process locally against your live cluster:

$ make run
INFO[0000] Running the operator locally.
INFO[0000] Using namespace production.
{"level":"info","ts":1512567890.123,"logger":"controller-backup","msg":"Reconciling Backup","namespace":"production","name":"mydb-nightly"}

Apply a Backup resource in a separate terminal:

$ kubectl apply -f deploy/crds/ops_v1alpha1_backup_cr.yaml
backup.ops.cloudista.io "mydb-nightly" created

You'll see a reconcile triggered in the controller log, and a CronJob should appear:

$ kubectl get cronjobs -n production
NAME                  SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mydb-nightly-backup   0 2 * * *   False     0        <none>          5s

This local development loop — edit code, make run, apply a resource, watch the log — is fast. Use it.

RBAC for the Controller

When you deploy the operator to the cluster, it runs as a Pod with a ServiceAccount. That account needs permissions to manage the resources your operator touches:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: backup-operator
rules:
  - apiGroups: ["ops.cloudista.io"]
    resources: ["backups", "backups/status"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: ["batch"]
    resources: ["cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Scope it to exactly what the operator needs. Don't give it cluster-admin because it's convenient.

What This Doesn't Cover

This is an introduction. A production operator needs more:

Finalizers. Without a finalizer, deleting a Backup resource before the CronJob is removed could leave orphaned resources. Finalizers let your controller run cleanup logic before the resource is actually deleted.
Owner references. Set the Backup as the owner of the CronJob it creates so that Kubernetes garbage collection handles cleanup automatically when the owner is deleted.
Error handling and requeueing. Return reconcile.Result{RequeueAfter: 30 * time.Second} to retry on transient errors rather than crashing the loop.
Status conditions. Rather than a simple string field, use the standard Conditions pattern so kubectl describe gives useful output.

Wrapping Up

The operator pattern is the right model for stateful workloads with domain-specific operational requirements. The tooling in late 2017 is young — expect rough edges in the SDK, some API instability, and a relatively small community compared to where this will be in a year or two. But the core pattern (CRD + controller reconcile loop) is solid and isn't going to change. If you have a workload that needs operational logic beyond "keep it running," this is where to invest.