Cilium and eBPF: The Next Generation of Kubernetes Networking

Kubernetes networking has a dirty secret: for most clusters, the entire data plane runs through iptables rules that were designed in the late 1990s for a completely different problem. As your cluster grows, you start to feel it. Let me explain why, and why Cilium backed by eBPF is the answer I've landed on.

Updated March 2026: Cilium graduated from the CNCF in October 2023, and it's now the default CNI in GKE Autopilot, EKS (via add-on), and several managed Kubernetes offerings. eBPF-based networking has gone from "advanced option" to "recommended path" for most production clusters. The ecosystem around Hubble has also matured significantly, with Hubble Enterprise and deeper integrations with observability platforms.

The iptables Problem at Scale

When you have 10 nodes and 50 pods, iptables works fine. The rules are manageable, the performance is acceptable, and you can still read the output of iptables-save without weeping.

At 500 nodes and 5,000 pods, you're in a different world. Every new Service in Kubernetes generates more iptables rules — kube-proxy maintains DNAT rules for every endpoint behind every Service. In a cluster with 1,000 Services averaging 3 pods each, you're looking at tens of thousands of rules that get evaluated sequentially for every packet.

The performance hit is real: iptables uses a linear rule traversal model. No indexing, no hashing. Processing latency scales with rule count. I've seen clusters where Service connection latency is measurable in single-digit milliseconds just from iptables overhead. For latency-sensitive workloads, that matters.

The second problem is observability. iptables gives you zero visibility into what's actually happening. You can see that rules exist and count matches, but you can't ask "which services is my payment processor pod talking to right now?" without reaching for tcpdump or adding a service mesh. The tooling simply wasn't designed for this.

eBPF: Kernel-Level Programmability

eBPF (extended Berkeley Packet Filter) lets you load verified programs into the Linux kernel that run in response to events — network packets, system calls, tracepoints — without modifying the kernel itself and without the overhead of a kernel module.

The key word is verified. Before an eBPF program runs, the kernel's verifier checks it for safety: no unbounded loops, no invalid memory accesses, terminates in bounded time. This is what makes it safe to use in production even though you're running custom code in kernel space.

For networking specifically, eBPF programs can attach at XDP (eXpress Data Path, before sk_buff allocation) or at TC (traffic control). This lets you make forwarding decisions at the earliest possible point, before the kernel's normal network stack even sees the packet.

Instead of maintaining iptables rules, eBPF-based networking uses BPF maps — efficient hash tables and arrays in kernel memory that can be updated atomically from userspace. Adding a new Service endpoint doesn't mean rebuilding a rule chain; it means updating a map entry. Lookup is O(1) regardless of cluster size.

What Cilium Does Differently

Cilium replaces your CNI plugin (Flannel, Calico, Weave) and, optionally, kube-proxy entirely. Its data plane is eBPF programs and BPF maps. The control plane is a Kubernetes operator (cilium-operator) plus a per-node DaemonSet agent (cilium-agent) that manages the BPF programs and maps on each node.

The networking model uses direct routing between pods where possible. With native routing enabled and BGP or direct routes configured, Cilium can forward pod-to-pod traffic without encapsulation overhead. This isn't unique to Cilium, but Cilium's implementation is clean.

What is genuinely different is L7-aware network policy. Traditional Kubernetes NetworkPolicy operates at L3/L4 — you allow or deny based on IP and port. Cilium extends this to L7: you can write policy that says "this pod may only make HTTP GET requests to /api/v1/users, not POST, not to /admin". The enforcement happens in the kernel via eBPF, not in a sidecar proxy.

Here's a comparison. A standard Kubernetes NetworkPolicy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

This works, but "allow TCP on 8080" is a wide open door. The equivalent Cilium policy with L7 enforcement:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-frontend-to-backend
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: GET
          path: "/api/.*"

Now the backend is only reachable from frontend pods making GET requests to /api/ paths. That's a material security improvement with no application changes required.

Installing Cilium as a CNI Replacement

If you're standing up a new cluster, install Cilium first before anything else reaches the network. With Helm:

helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.10.4 \
  --namespace kube-system \
  --set kubeProxyReplacement=strict \
  --set k8sServiceHost=YOUR_API_SERVER_IP \
  --set k8sServicePort=6443

The kubeProxyReplacement=strict flag tells Cilium to replace kube-proxy entirely. This is the mode I run in new clusters. You do need to ensure kube-proxy is not running alongside it — in managed Kubernetes offerings this varies by provider.

For migrating an existing cluster running Flannel or Calico, the path is real but involves a maintenance window. The short version: drain nodes, remove the existing CNI, deploy Cilium, uncordon. There's no live migration path between CNI plugins.

Verify the installation worked:

cilium status
cilium connectivity test

The connectivity test is thorough — it spins up test pods and validates L3, L4, and L7 connectivity scenarios. Run it after installation and after major cluster changes.

Hubble: Observability That Should Have Existed All Along

Hubble is the observability layer built on top of Cilium. It hooks into the same eBPF infrastructure to provide per-flow visibility across the entire cluster.

Enable it during install:

helm install cilium cilium/cilium \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  ...

Once running, hubble observe gives you a real-time stream of network flows:

TIMESTAMP             SOURCE                   DESTINATION              TYPE   VERDICT
Sep  8 10:14:23.001   default/frontend:52341   default/backend:8080     HTTP   FORWARDED
Sep  8 10:14:23.004   default/backend:8080     default/frontend:52341   HTTP   FORWARDED
Sep  8 10:14:24.101   kube-system/coredns      8.8.8.8:53              DNS    FORWARDED

The Hubble UI generates service maps automatically from observed traffic. For a cluster where nobody fully documented which services call which, this is immediately useful — you can see the actual call graph, not the one you think exists.

From a security posture standpoint, Hubble is what makes "deny by default" NetworkPolicy actually auditable. You can see what's being dropped and why, without chasing down application logs.

Is the Overhead Worth It?

The operational investment is real. eBPF requires a modern kernel (4.9+ for basics, 5.x for full feature support). Debugging BPF programs when something goes wrong requires different skills than debugging iptables. Your team needs to build familiarity with cilium CLI tooling.

But the performance at scale is not marginal. Removing kube-proxy and iptables from the data path measurably reduces P99 latency for Service-to-Service calls in large clusters. The observability story with Hubble is simply better than anything you can get with iptables-based CNIs without bolting on a full service mesh.

For new clusters above a few dozen nodes, I default to Cilium now. The ecosystem is mature, the Helm chart is well-maintained, and the operational model makes sense once you've spent time with it.