Blog
July 10, 2019 Marie H.

Managing Multiple Kubernetes Clusters at Scale

Managing Multiple Kubernetes Clusters at Scale

Photo by <a href="https://unsplash.com/@mrsmrg?utm_source=cloudista&utm_medium=referral" target="_blank" rel="noopener">Mrg Simon</a> on <a href="https://unsplash.com/?utm_source=cloudista&utm_medium=referral" target="_blank" rel="noopener">Unsplash</a>

Managing Multiple Kubernetes Clusters at Scale

When I joined the IBM storage division as a consulting engineer through Innova Solutions, one of the first things I was handed was a diagram of their Kubernetes footprint. Six clusters across three regions. Different teams owning different pieces, no consistent access policy, and a CI/CD pipeline that roughly amounted to "SSH into the control plane and apply the manifest manually." This post is about what we built instead.

Why Multiple Clusters in the First Place

The instinct when you're starting out is to run everything in one cluster and separate concerns with namespaces. That works up to a point, but at IBM we had hard reasons to go multi-cluster. First, regional isolation: storage services were hitting latency requirements that demanded compute close to customers in the US, EU, and Asia-Pacific. A single cluster can't span regions without serious networking complexity. Second, blast radius. A misconfigured admission webhook or a node pool exhaustion event in one cluster shouldn't cascade into production in another region. Third, the classic dev/staging/prod split — and at IBM's scale, "staging" needed to be a realistic mirror of production, not a stripped-down namespace sharing the same etcd.

Kubeconfig Management

Before you can do anything across multiple clusters, you need to manage your kubeconfig sanely. The default is everything crammed into ~/.kube/config, which gets ugly fast. The better approach is separate config files per cluster, composed at runtime via the KUBECONFIG environment variable:

export KUBECONFIG=~/.kube/ibm-us-east.yaml:~/.kube/ibm-eu-west.yaml:~/.kube/ibm-ap-south.yaml

Now kubectl config get-contexts shows all three. Switching is kubectl config use-context ibm-us-east-prod, but in practice I use kubectx — it's a small tool that turns context switching into kubectx ibm-us-east-prod. Paired with kubens for namespace switching, it saves a lot of typing. Not essential, but once you've used it you won't go back.

For CI scripts, you don't want implicit state from the current context. Always pass --context explicitly:

kubectl apply -f manifests/deployment.yaml --context ibm-us-east-prod
kubectl rollout status deployment/key-manager --context ibm-us-east-prod --timeout=120s

This makes scripts portable and their behavior obvious from reading them.

Cluster Federation vs. Multi-Cluster Tooling

In 2019, Kubernetes Federation (KubeFed v2) was available but genuinely rough. The idea is appealing: define a FederatedDeployment resource once, and the federation control plane syncs it across member clusters, handling propagation and overrides per cluster. In practice, v2 was still stabilizing, the CRDs were complex, and the operational overhead of running the federation control plane felt like a liability rather than an asset.

We looked at it seriously and decided against it. The alternative tools in that space — Admiralty, Liqo — were even earlier in maturity. What we actually needed wasn't dynamic workload migration across clusters; we needed consistent deployments and consistent policy. For that, GitOps with ArgoCD was a cleaner answer.

What We Actually Did: GitOps with ArgoCD

ArgoCD had just hit a point of reasonable production stability. The model is: a Git repository is the source of truth, ArgoCD watches it and reconciles each cluster's state to match. We set up one ArgoCD instance per cluster — not a hub-and-spoke model, just independent instances all watching the same Git repo but deploying different Application resources scoped to their cluster.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: key-manager
  namespace: argocd
spec:
  project: storage-services
  source:
    repoURL: https://github.com/ibm-storage/k8s-manifests
    targetRevision: main
    path: services/key-manager/overlays/us-east-prod
  destination:
    server: https://kubernetes.default.svc
    namespace: key-manager
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Environment-specific configuration lived in Kustomize overlays. The base had the shared Deployment and Service, the overlays had replica counts, resource limits, and image tags tuned for each environment. This meant a promotion from staging to production was a pull request changing the image tag in the production overlay. Auditable, reviewable, rollback is a git revert.

Namespace-per-Environment vs. Cluster-per-Environment

This is a real tradeoff and the right answer depends on your team's constraints. Namespace-per-environment on a single cluster is cheaper and simpler to operate. You get logical isolation, separate RBAC, separate resource quotas. The downside is that a cluster-level failure or a noisy-neighbor problem at the node pool level affects all environments. A bad DaemonSet rollout can consume node resources across namespaces.

Cluster-per-environment gives you true blast radius isolation. An experiment that exhausts a dev cluster's node pool doesn't touch staging. API server load in production doesn't slow down developers. The cost is real: more control planes to run, more operational surface area. At IBM, for the storage division's production workloads, the isolation was worth it. For smaller projects, I'd start with namespaces and migrate when the pain is real rather than theoretical.

Cross-Cluster Service Discovery

This was the hardest part. When Service A in the US-East cluster needs to call Service B in EU-West, you need a way to resolve that. The options in 2019 were roughly:

  1. External load balancers with stable DNS names — dumb but reliable. Each service that needed cross-cluster access got an AWS NLB with a stable DNS name. Not Kubernetes-native, but it worked.
  2. DNS federation — CoreDNS can be configured to forward specific zones to DNS servers in other clusters. This is elegant in theory and annoying to operate.
  3. Service mesh multi-cluster — Istio supports multi-cluster, but in 2019 the multi-cluster Istio setup was genuinely painful. Certificate management across clusters, the control plane federation configuration, the debugging story when something went wrong. We evaluated it and decided the operational complexity wasn't justified for the number of cross-cluster calls we actually had.

We went with option 1 for anything cross-cluster. If a service was truly internal to one cluster, it used ClusterIP and standard Kubernetes DNS. Cross-cluster calls went through NLBs. Simple, boring, and our on-call rotation understood it.

RBAC Consistency Across Clusters

Keeping RBAC policies consistent across six clusters by hand is how you end up with security drift. We used Ansible. Not glamorous, but it worked. The RBAC manifests — ClusterRoles, ClusterRoleBindings, RoleBindings in specific namespaces — lived in the same Git repo as everything else. An Ansible playbook iterated over the cluster inventory and applied them:

- name: Apply RBAC manifests
  hosts: localhost
  vars:
    clusters:
      - name: ibm-us-east-prod
        kubeconfig: "{{ lookup('env', 'HOME') }}/.kube/ibm-us-east.yaml"
      - name: ibm-eu-west-prod
        kubeconfig: "{{ lookup('env', 'HOME') }}/.kube/ibm-eu-west.yaml"
  tasks:
    - name: Apply cluster roles
      kubernetes.core.k8s:
        kubeconfig: "{{ item.kubeconfig }}"
        state: present
        src: "{{ playbook_dir }}/rbac/cluster-roles.yaml"
      loop: "{{ clusters }}"

This ran in CI on any change to the RBAC directory. Every cluster got the same policies within minutes of a merge.

A Note on Scale

None of this is magic. The patterns here — GitOps for deployment, explicit context flags in scripts, Ansible for policy consistency, external DNS for cross-cluster routing — are all conservative choices. At the time I wanted to use KubeFed or Istio multi-cluster more aggressively, but the maturity wasn't there and the blast radius of a misconfigured federation control plane on production storage infrastructure wasn't a risk worth taking. Sometimes the boring solution is the right one.