Building a Slack-Triggered Deploy API Inside Kubernetes
The deploy workflow I described in my earlier post — deploykube <image> <deployment> — worked fine if you were the one running it. But we were a distributed team, and when three people can deploy, you end up with questions like "did anyone deploy the auth service in the last hour?" answered by looking at /tmp/deploy.log on whoever's laptop last ran the script. That's not a system. That's chaos with logging.
The fix was obvious: move the deploy mechanism out of individual laptops and into a central place everyone could see. We were already living in Slack. So I built a small HTTP API that ran inside the Kubernetes cluster and wired it to a Slack slash command. Type /deploy myservice 1.4 in Slack, the API runs the equivalent of kubectl set image, and everyone in the channel sees what happened and who triggered it.
Architecture
The service was a Flask app — a few hundred lines of Python — running as a Deployment in the kube-system namespace. It exposed a single POST endpoint. Slack slash commands send a form-encoded POST to a URL you configure, so the surface area was small.
Slack sends something like this when you run /deploy myservice 1.4:
token=xxxxxxxxxxxxxxxxxxx
team_domain=doublehorn
user_name=marie
command=/deploy
text=myservice 1.4
response_url=https://hooks.slack.com/commands/...
The API validates the token, parses text into a service name and version, runs the kubectl command, and responds.
The full request flow:
- Someone types
/deploy myservice 1.4in Slack - Slack POSTs to
http://<nodeport-ip>:<port>/deploy - Flask validates the
tokenfield against the expected value stored in a Kubernetes Secret - Flask parses
text:service = "myservice",version = "1.4" - Flask runs
kubectl set image deployment/myservice myservice=<ecr-base>/myservice:1.4viasubprocess - Flask sends an immediate
200 OKwith an acknowledgment back to Slack (within 3 seconds, or Slack times out) - Flask posts the full kubectl output asynchronously to
response_url
That last part — the async response — is important. Slack requires a response within 3 seconds of the POST, but kubectl can take longer than that if the cluster is under load. The pattern is: acknowledge immediately, then POST the real result to response_url after the subprocess finishes.
The Flask handler
import subprocess
import threading
import requests
from flask import Flask, request, jsonify
import os
app = Flask(__name__)
ECR_BASE = os.environ["ECR_BASE"]
SLACK_TOKEN = os.environ["SLACK_TOKEN"]
def run_deploy(service, version, response_url):
"""Run kubectl set image and post the result to Slack asynchronously."""
image = f"{ECR_BASE}/{service}:{version}"
cmd = [
"kubectl", "set", "image",
f"deployment/{service}",
f"{service}={image}"
]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode == 0:
msg = f"Deployed `{service}:{version}` successfully.\n```{result.stdout.strip()}```"
else:
msg = f"Deploy failed for `{service}:{version}`.\n```{result.stderr.strip()}```"
# Post back to Slack via the response_url (valid for 30 minutes)
requests.post(response_url, json={"text": msg, "response_type": "in_channel"})
@app.route("/deploy", methods=["POST"])
def deploy():
# Validate the Slack verification token
token = request.form.get("token")
if token != SLACK_TOKEN:
return jsonify({"text": "Invalid token."}), 403
# Only accept requests from our workspace
team_domain = request.form.get("team_domain")
if team_domain != "doublehorn":
return jsonify({"text": "Unauthorized workspace."}), 403
text = request.form.get("text", "").strip()
user = request.form.get("user_name", "unknown")
response_url = request.form.get("response_url")
parts = text.split()
if len(parts) != 2:
return jsonify({
"text": "Usage: `/deploy <service> <version>`",
"response_type": "ephemeral"
})
service, version = parts
# Acknowledge immediately — Slack will time out after 3 seconds
# The actual deploy runs in a background thread
threading.Thread(
target=run_deploy,
args=(service, version, response_url)
).start()
return jsonify({
"text": f"Deploying `{service}:{version}` — triggered by {user}...",
"response_type": "in_channel"
})
The Kubernetes setup
The Deployment spec was straightforward. The interesting parts were the ServiceAccount and the Secret.
apiVersion: v1
kind: ServiceAccount
metadata:
name: deploy-api
namespace: kube-system
---
apiVersion: v1
kind: Secret
metadata:
name: deploy-api-secrets
namespace: kube-system
type: Opaque
stringData:
SLACK_TOKEN: "your-slack-verification-token"
ECR_BASE: "123456789.dkr.ecr.us-east-1.amazonaws.com"
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: deploy-api
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
name: deploy-api
spec:
serviceAccountName: deploy-api
containers:
- name: deploy-api
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/deploy-api:latest
ports:
- containerPort: 5000
envFrom:
- secretRef:
name: deploy-api-secrets
---
apiVersion: v1
kind: Service
metadata:
name: deploy-api
namespace: kube-system
spec:
type: NodePort
ports:
- port: 5000
nodePort: 30500
selector:
name: deploy-api
The ServiceAccount needed permissions to patch Deployments across all namespaces. At the time, our RBAC configuration was not fine-grained — we were on Kubernetes 1.3/1.4 and hadn't locked things down yet. The ClusterRole looked like this:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: deploy-api-role
rules:
- apiGroups: ["extensions", "apps"]
resources: ["deployments"]
verbs: ["get", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: deploy-api-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: deploy-api-role
subjects:
- kind: ServiceAccount
name: deploy-api
namespace: kube-system
That's scoped to just get, patch, and update on deployments. Today I'd restrict it further — specific namespaces, specific resource names — but for a small trusted team this was a reasonable starting point.
Why kube-system
I put it there because at the time it felt like the right namespace for cluster infrastructure tooling. Application workloads were in default or service-specific namespaces; kube-system was already the home for things like kube-dns, heapster, and the dashboard. The deploy API wasn't an application, it was ops infrastructure.
In retrospect, I'd put it in a dedicated ops-tools namespace. kube-system is for Kubernetes components, and mixing your own tooling in there makes it harder to reason about what belongs there and what doesn't. It also means your tooling gets the same level of implicit trust as cluster DNS, which is probably more than it needs.
Security model
The Slack verification token was stored as a Kubernetes Secret and mounted into the pod via envFrom. Every incoming request was validated against it before doing anything. We also checked the team_domain field to confirm the request came from our Slack workspace.
The Slack verification token model is older than the current signing secrets approach (which signs the entire request payload with HMAC-SHA256). What we had was effectively a shared password — if the token leaked, anyone who had it could trigger deploys. Acceptable for an internal tool on a trusted network, but I'd use signing secrets today.
NodePort exposed the service on a fixed port on every node's external IP. Not ideal — it meant knowing a node IP to configure the Slack slash command URL, and node IPs can change. A proper ingress would have been cleaner. We didn't have one set up at the time.
What this taught me
Building this thing forced me to actually understand several parts of Kubernetes I'd been glossing over. ServiceAccounts and how pod identity works in the API. How the in-cluster kubeconfig is automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/. How cluster DNS resolved deploy-api.kube-system.svc.cluster.local from inside other pods. How RBAC rules composed. I learned more about the platform from this one project than from weeks of reading the docs.
The service ran for a few months before we replaced it with a more robust ChatOps setup. The concept is still valid — a small purpose-built API inside your cluster that your tooling can call. Today you'd do something similar with Argo Workflows or Tekton triggers and a proper Slack bot using OAuth instead of verification tokens. The primitives are cleaner and the security model is better. But the pattern is the same: your cluster is the right place to run tools that talk to your cluster.