Building a Slack-Triggered Deploy API Inside Kubernetes

The deploy workflow I described in my earlier post — deploykube <image> <deployment> — worked fine if you were the one running it. But we were a distributed team, and when three people can deploy, you end up with questions like "did anyone deploy the auth service in the last hour?" answered by looking at /tmp/deploy.log on whoever's laptop last ran the script. That's not a system. That's chaos with logging.

The fix was obvious: move the deploy mechanism out of individual laptops and into a central place everyone could see. We were already living in Slack. So I built a small HTTP API that ran inside the Kubernetes cluster and wired it to a Slack slash command. Type /deploy myservice 1.4 in Slack, the API runs the equivalent of kubectl set image, and everyone in the channel sees what happened and who triggered it.

Architecture

The service was a Flask app — a few hundred lines of Python — running as a Deployment in the kube-system namespace. It exposed a single POST endpoint. Slack slash commands send a form-encoded POST to a URL you configure, so the surface area was small.

Slack sends something like this when you run /deploy myservice 1.4:

token=xxxxxxxxxxxxxxxxxxx
team_domain=doublehorn
user_name=marie
command=/deploy
text=myservice 1.4
response_url=https://hooks.slack.com/commands/...

The API validates the token, parses text into a service name and version, runs the kubectl command, and responds.

The full request flow:

Someone types /deploy myservice 1.4 in Slack
Slack POSTs to http://<nodeport-ip>:<port>/deploy
Flask validates the token field against the expected value stored in a Kubernetes Secret
Flask parses text: service = "myservice", version = "1.4"
Flask runs kubectl set image deployment/myservice myservice=<ecr-base>/myservice:1.4 via subprocess
Flask sends an immediate 200 OK with an acknowledgment back to Slack (within 3 seconds, or Slack times out)
Flask posts the full kubectl output asynchronously to response_url

That last part — the async response — is important. Slack requires a response within 3 seconds of the POST, but kubectl can take longer than that if the cluster is under load. The pattern is: acknowledge immediately, then POST the real result to response_url after the subprocess finishes.

The Flask handler

import subprocess
import threading
import requests
from flask import Flask, request, jsonify
import os

app = Flask(__name__)

ECR_BASE = os.environ["ECR_BASE"]
SLACK_TOKEN = os.environ["SLACK_TOKEN"]


def run_deploy(service, version, response_url):
    """Run kubectl set image and post the result to Slack asynchronously."""
    image = f"{ECR_BASE}/{service}:{version}"
    cmd = [
        "kubectl", "set", "image",
        f"deployment/{service}",
        f"{service}={image}"
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)

    if result.returncode == 0:
        msg = f"Deployed `{service}:{version}` successfully.\n```{result.stdout.strip()}```"
    else:
        msg = f"Deploy failed for `{service}:{version}`.\n```{result.stderr.strip()}```"

    # Post back to Slack via the response_url (valid for 30 minutes)
    requests.post(response_url, json={"text": msg, "response_type": "in_channel"})


@app.route("/deploy", methods=["POST"])
def deploy():
    # Validate the Slack verification token
    token = request.form.get("token")
    if token != SLACK_TOKEN:
        return jsonify({"text": "Invalid token."}), 403

    # Only accept requests from our workspace
    team_domain = request.form.get("team_domain")
    if team_domain != "doublehorn":
        return jsonify({"text": "Unauthorized workspace."}), 403

    text = request.form.get("text", "").strip()
    user = request.form.get("user_name", "unknown")
    response_url = request.form.get("response_url")

    parts = text.split()
    if len(parts) != 2:
        return jsonify({
            "text": "Usage: `/deploy <service> <version>`",
            "response_type": "ephemeral"
        })

    service, version = parts

    # Acknowledge immediately — Slack will time out after 3 seconds
    # The actual deploy runs in a background thread
    threading.Thread(
        target=run_deploy,
        args=(service, version, response_url)
    ).start()

    return jsonify({
        "text": f"Deploying `{service}:{version}` — triggered by {user}...",
        "response_type": "in_channel"
    })

The Kubernetes setup

The Deployment spec was straightforward. The interesting parts were the ServiceAccount and the Secret.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: deploy-api
  namespace: kube-system
---
apiVersion: v1
kind: Secret
metadata:
  name: deploy-api-secrets
  namespace: kube-system
type: Opaque
stringData:
  SLACK_TOKEN: "your-slack-verification-token"
  ECR_BASE: "123456789.dkr.ecr.us-east-1.amazonaws.com"
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: deploy-api
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: deploy-api
    spec:
      serviceAccountName: deploy-api
      containers:
        - name: deploy-api
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/deploy-api:latest
          ports:
            - containerPort: 5000
          envFrom:
            - secretRef:
                name: deploy-api-secrets
---
apiVersion: v1
kind: Service
metadata:
  name: deploy-api
  namespace: kube-system
spec:
  type: NodePort
  ports:
    - port: 5000
      nodePort: 30500
  selector:
    name: deploy-api

The ServiceAccount needed permissions to patch Deployments across all namespaces. At the time, our RBAC configuration was not fine-grained — we were on Kubernetes 1.3/1.4 and hadn't locked things down yet. The ClusterRole looked like this:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: deploy-api-role
rules:
  - apiGroups: ["extensions", "apps"]
    resources: ["deployments"]
    verbs: ["get", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: deploy-api-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: deploy-api-role
subjects:
  - kind: ServiceAccount
    name: deploy-api
    namespace: kube-system

That's scoped to just get, patch, and update on deployments. Today I'd restrict it further — specific namespaces, specific resource names — but for a small trusted team this was a reasonable starting point.

Why `kube-system`

I put it there because at the time it felt like the right namespace for cluster infrastructure tooling. Application workloads were in default or service-specific namespaces; kube-system was already the home for things like kube-dns, heapster, and the dashboard. The deploy API wasn't an application, it was ops infrastructure.

In retrospect, I'd put it in a dedicated ops-tools namespace. kube-system is for Kubernetes components, and mixing your own tooling in there makes it harder to reason about what belongs there and what doesn't. It also means your tooling gets the same level of implicit trust as cluster DNS, which is probably more than it needs.

Security model

The Slack verification token was stored as a Kubernetes Secret and mounted into the pod via envFrom. Every incoming request was validated against it before doing anything. We also checked the team_domain field to confirm the request came from our Slack workspace.

The Slack verification token model is older than the current signing secrets approach (which signs the entire request payload with HMAC-SHA256). What we had was effectively a shared password — if the token leaked, anyone who had it could trigger deploys. Acceptable for an internal tool on a trusted network, but I'd use signing secrets today.

NodePort exposed the service on a fixed port on every node's external IP. Not ideal — it meant knowing a node IP to configure the Slack slash command URL, and node IPs can change. A proper ingress would have been cleaner. We didn't have one set up at the time.

What this taught me

Building this thing forced me to actually understand several parts of Kubernetes I'd been glossing over. ServiceAccounts and how pod identity works in the API. How the in-cluster kubeconfig is automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/. How cluster DNS resolved deploy-api.kube-system.svc.cluster.local from inside other pods. How RBAC rules composed. I learned more about the platform from this one project than from weeks of reading the docs.

The service ran for a few months before we replaced it with a more robust ChatOps setup. The concept is still valid — a small purpose-built API inside your cluster that your tooling can call. Today you'd do something similar with Argo Workflows or Tekton triggers and a proper Slack bot using OAuth instead of verification tokens. The primitives are cleaner and the security model is better. But the pattern is the same: your cluster is the right place to run tools that talk to your cluster.