Kubernetes is good at running stateless applications. Rolling update a Deployment, scale it up, scale it down — the control plane handles all of that. But stateful applications have operational knowledge that can't be encoded in a Deployment. How do you safely add a node to a Cassandra ring? How do you run a backup without corrupting a live database? You need something that understands your application's domain, not just "keep N replicas running."
That's what operators are. An operator is a controller that manages a custom resource — one you define — and encodes application-specific operational logic in the reconcile loop. The concept was introduced by CoreOS in 2016, and right now in late 2017 it's still early enough that most teams haven't written one. But the pattern is solid, the tooling is improving, and if you have a stateful workload running in Kubernetes it's worth understanding.
The Pattern: CRD + Controller
Two pieces:
Custom Resource Definition (CRD). A CRD extends the Kubernetes API with a new resource type. Once you apply a CRD, you can kubectl apply resources of that type the same as any built-in resource. The API server validates and stores them in etcd.
Controller. A controller watches the Kubernetes API for resources it cares about and reconciles current state toward desired state. Every built-in Kubernetes controller (Deployment controller, StatefulSet controller, etc.) follows this model. Your operator is a controller that watches your custom resource.
The reconcile loop is straightforward conceptually:
- Watch for create/update/delete events on your custom resource.
- Read the current state of the world (what's actually running in the cluster).
- Compare it to the desired state (what the resource spec says).
- Take actions to close the gap.
- Update the resource's
statussubresource to reflect current state. - Repeat.
The Tooling: kubebuilder and operator-sdk
Two main options at the end of 2017:
operator-sdk (from CoreOS) — opinionated scaffolding, builds on controller-runtime, reasonably good documentation. I'd start here for most cases.
kubebuilder (from the Kubernetes sig-api-machinery group) — lower-level, more control over the generated code. Useful if you need fine-grained control over API machinery.
Both are Go-based. I'm going to walk through the operator-sdk approach because the scaffolding does the most work for you upfront.
$ brew install operator-sdk
$ operator-sdk version
operator-sdk version: 0.1.0
A Simple Example: A Backup Operator
Let's write an operator that manages scheduled backups. The custom resource looks like this:
apiVersion: ops.cloudista.io/v1alpha1
kind: Backup
metadata:
name: mydb-nightly
namespace: production
spec:
schedule: "0 2 * * *"
target:
kind: StatefulSet
name: mydb
destination:
s3Bucket: mydb-backups
s3Prefix: nightly/
retentionDays: 30
When this resource exists in the cluster, the operator ensures a CronJob is running that takes a backup on the specified schedule and uploads it to S3. When the Backup resource is deleted, the CronJob gets cleaned up. When the schedule changes, the CronJob gets updated. The operator owns the lifecycle.
Scaffolding the Project
$ operator-sdk new backup-operator --api-version=ops.cloudista.io/v1alpha1 --kind=Backup
$ cd backup-operator
$ ls
cmd/ deploy/ pkg/ vendor/ Gopkg.toml Makefile
The scaffold generates:
- pkg/apis/ops/v1alpha1/backup_types.go — define your CRD spec and status structs here
- pkg/controller/backup/backup_controller.go — the reconcile loop goes here
- deploy/crds/ — generated CRD YAML
Defining the CRD
Edit backup_types.go to define your spec and status types:
type BackupSpec struct {
Schedule string `json:"schedule"`
Target BackupTarget `json:"target"`
Destination BackupDest `json:"destination"`
RetentionDays int32 `json:"retentionDays,omitempty"`
}
type BackupTarget struct {
Kind string `json:"kind"`
Name string `json:"name"`
}
type BackupDest struct {
S3Bucket string `json:"s3Bucket"`
S3Prefix string `json:"s3Prefix,omitempty"`
}
type BackupStatus struct {
LastBackupTime *metav1.Time `json:"lastBackupTime,omitempty"`
LastBackupStatus string `json:"lastBackupStatus,omitempty"`
CronJobName string `json:"cronJobName,omitempty"`
}
Run the code generator to produce the CRD YAML:
$ operator-sdk generate k8s
This gives you deploy/crds/ops_v1alpha1_backup_crd.yaml:
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: backups.ops.cloudista.io
spec:
group: ops.cloudista.io
names:
kind: Backup
listKind: BackupList
plural: backups
singular: backup
scope: Namespaced
version: v1alpha1
Apply this to your cluster before you try to create any Backup resources:
$ kubectl apply -f deploy/crds/ops_v1alpha1_backup_crd.yaml
customresourcedefinition.apiextensions.k8s.io "backups.ops.cloudista.io" created
The Reconcile Loop
The scaffold generates a Reconcile function in backup_controller.go. Here's the shape of what you'd fill in:
func (r *ReconcileBackup) Reconcile(request reconcile.Request) (reconcile.Result, error) {
// Fetch the Backup resource
backup := &opsv1alpha1.Backup{}
err := r.client.Get(context.TODO(), request.NamespacedName, backup)
if err != nil {
if errors.IsNotFound(err) {
// Resource was deleted — nothing to do, CronJob has an owner reference
return reconcile.Result{}, nil
}
return reconcile.Result{}, err
}
// Check if a CronJob already exists for this Backup
cronJob := &batchv1beta1.CronJob{}
err = r.client.Get(context.TODO(), types.NamespacedName{
Name: backup.Name + "-backup",
Namespace: backup.Namespace,
}, cronJob)
if errors.IsNotFound(err) {
// CronJob doesn't exist — create it
newCronJob := r.cronJobForBackup(backup)
err = r.client.Create(context.TODO(), newCronJob)
if err != nil {
return reconcile.Result{}, err
}
backup.Status.CronJobName = newCronJob.Name
r.client.Status().Update(context.TODO(), backup)
return reconcile.Result{}, nil
}
// CronJob exists — check for drift and reconcile
if cronJob.Spec.Schedule != backup.Spec.Schedule {
cronJob.Spec.Schedule = backup.Spec.Schedule
err = r.client.Update(context.TODO(), cronJob)
if err != nil {
return reconcile.Result{}, err
}
}
return reconcile.Result{}, nil
}
The critical design principle: reconcile loops must be idempotent. You're not handling "backup was created" events — you're observing current state, comparing to desired state, and closing the gap. If this function runs twice with nothing changed in between, it should be a no-op. Design for that from the start.
Running It Locally
You don't need to build and push a container image to test the operator. make run runs the controller process locally against your live cluster:
$ make run
INFO[0000] Running the operator locally.
INFO[0000] Using namespace production.
{"level":"info","ts":1512567890.123,"logger":"controller-backup","msg":"Reconciling Backup","namespace":"production","name":"mydb-nightly"}
Apply a Backup resource in a separate terminal:
$ kubectl apply -f deploy/crds/ops_v1alpha1_backup_cr.yaml
backup.ops.cloudista.io "mydb-nightly" created
You'll see a reconcile triggered in the controller log, and a CronJob should appear:
$ kubectl get cronjobs -n production
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mydb-nightly-backup 0 2 * * * False 0 <none> 5s
This local development loop — edit code, make run, apply a resource, watch the log — is fast. Use it.
RBAC for the Controller
When you deploy the operator to the cluster, it runs as a Pod with a ServiceAccount. That account needs permissions to manage the resources your operator touches:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: backup-operator
rules:
- apiGroups: ["ops.cloudista.io"]
resources: ["backups", "backups/status"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: ["batch"]
resources: ["cronjobs"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Scope it to exactly what the operator needs. Don't give it cluster-admin because it's convenient.
What This Doesn't Cover
This is an introduction. A production operator needs more:
- Finalizers. Without a finalizer, deleting a Backup resource before the CronJob is removed could leave orphaned resources. Finalizers let your controller run cleanup logic before the resource is actually deleted.
- Owner references. Set the Backup as the owner of the CronJob it creates so that Kubernetes garbage collection handles cleanup automatically when the owner is deleted.
- Error handling and requeueing. Return
reconcile.Result{RequeueAfter: 30 * time.Second}to retry on transient errors rather than crashing the loop. - Status conditions. Rather than a simple string field, use the standard Conditions pattern so
kubectl describegives useful output.
Wrapping Up
The operator pattern is the right model for stateful workloads with domain-specific operational requirements. The tooling in late 2017 is young — expect rough edges in the SDK, some API instability, and a relatively small community compared to where this will be in a year or two. But the core pattern (CRD + controller reconcile loop) is solid and isn't going to change. If you have a workload that needs operational logic beyond "keep it running," this is where to invest.