Kubernetes Resource Limits and Requests — Why They Matter

I've cleaned up more than one cluster that was falling over because nobody set resource limits. It's one of those things that feels optional until a memory leak in one pod takes down three other services. Let me save you that particular 2am experience.

Requests vs. limits — the actual difference

These two things are often confused, and the distinction matters.

Requests are what the scheduler uses. When you say a container requests 256Mi of memory and 250m CPU, the scheduler will only place that pod on a node that has at least that much unallocated capacity. Requests don't cap anything — they're a reservation for scheduling purposes.

Limits are what the runtime enforces. If a container exceeds its memory limit, it gets OOMKilled. If it exceeds its CPU limit, it gets throttled (CPU is compressible; memory is not). Limits are the actual ceiling.

The practical consequence: a container can request 256Mi but use 1Gi if no limit is set, because limits are what get enforced. Without limits you have no real control over what happens on your nodes.

What happens without them

The noisy neighbor problem is real. Imagine a node with 4Gi allocatable memory. You have 10 pods, each requesting 256Mi (total: 2.5Gi, fits fine). But two of those pods have no memory limit and start consuming 1Gi each under load. Now you're at 4.5Gi+ and the kernel OOM killer starts picking victims. The victims are usually not the pods causing the problem — they're the small, well-behaved pods that were just unlucky.

$ kubectl describe pod crashed-service-xyz
...
Last State: Terminated
  Reason: OOMKilled
  Exit Code: 137

Exit code 137 (128 + 9 = SIGKILL) on a pod you didn't touch is almost always memory limits or lack thereof.

Setting requests and limits

The resources block goes inside the container spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
    spec:
      containers:
      - name: api-service
        image: myorg/api-service:1.4.2
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

CPU is specified in millicores — 250m is a quarter of a CPU core. Memory uses the standard suffixes: Mi for mebibytes, Gi for gibibytes.

My rule of thumb: set your request to what the container uses at normal load, and your limit to the maximum you'd ever want it to use. If you're not sure, kubectl top pods (with metrics-server installed) is your friend.

LimitRange — defaults for a namespace

Setting limits on every container manually gets old. A LimitRange lets you set namespace-wide defaults so containers that don't specify anything get sensible values:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      memory: "256Mi"
      cpu: "200m"
    defaultRequest:
      memory: "128Mi"
      cpu: "100m"
    max:
      memory: "2Gi"
      cpu: "2"
    min:
      memory: "64Mi"
      cpu: "50m"

This sets defaults for containers that don't specify anything, and also enforces a max so nobody can accidentally request a 32Gi container in the namespace.

ResourceQuota — caps for the whole namespace

LimitRange is per-container. ResourceQuota caps aggregate consumption across the entire namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

Once you hit the quota, new pods won't schedule. This is useful in shared clusters where you want to ensure one team can't consume all available capacity.

Reading node capacity

To understand whether your requests will actually fit, kubectl describe node shows you the reality:

$ kubectl describe node ip-10-0-1-45.ec2.internal
...
Capacity:
  cpu:     4
  memory:  15956292Ki
Allocatable:
  cpu:     3920m
  memory:  15337668Ki
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests      Limits
  --------  --------      ------
  cpu       2350m (59%)   4200m (107%)
  memory    5120Mi (34%)  9216Mi (61%)

Two things to notice: "Allocatable" is less than "Capacity" because Kubernetes and the OS reserve some resources for themselves. And CPU limits over 100% is normal — CPU is overcommittable since it's throttled, not killed. Memory limits over 100% is where you start risking OOMKills.

Practical recommendations

Set requests and limits on everything. If you don't know the right values yet, start conservative and adjust based on actual kubectl top data. Don't set limits dramatically higher than requests — a request of 64Mi and a limit of 8Gi is not useful, it just guarantees your scheduler thinks a pod is cheap when it might not be.

For CPU: it's tempting to set high limits and low requests because throttling is less visible than OOMKill. Resist this — heavy CPU throttling will make your latency tail absolutely terrible and it won't show up obviously in your monitoring.

Put a LimitRange in every namespace. Non-negotiable. It's three minutes of work that has saved me hours of debugging.