Platform Engineering: What It Actually Means

I've been in a lot of conversations lately where "DevOps" and "platform engineering" get used interchangeably. They're not the same thing, and conflating them leads to teams being organized poorly and expectations being set wrong. Let me be direct about the distinction and what platform engineering actually requires.

DevOps Is Culture, Not a Team

DevOps is a set of practices and a cultural philosophy. "You build it, you run it." Developers own their services in production. Fast feedback loops. Shared responsibility between development and operations. When an organization says "we're doing DevOps," they mean development and ops are collaborating instead of throwing software over a wall.

The problem is that "DevOps" became a job title, and then it became a team. Now you have "the DevOps team" that owns CI/CD, and everyone else throws their build configs over a wall to them. That's not DevOps. That's the same silo problem with different labels.

Platform engineering is something different. It's not a cultural stance — it's a product discipline.

Platform Engineering Is Product Work

A platform team builds and maintains the internal infrastructure that other engineering teams use. The key word is product. The platform is a product. The customers are internal developers. Every decision a platform team makes should be evaluated the same way a product team evaluates decisions: does this solve a real problem for our users, is it usable without a manual, and does it reduce friction rather than add it?

This reframe matters enormously. When a platform team thinks of itself as "the infrastructure team," it tends to build things that are powerful but opaque. When it thinks of itself as a product team, it focuses on the developer experience. Onboarding time. Error messages that explain what went wrong. Sensible defaults. Escape hatches when the defaults don't fit.

The Golden Path

The concept I've found most useful is the golden path (sometimes called the paved road). A golden path is the recommended, well-supported way to do something on your platform. It's not mandatory — teams can deviate — but it's the path that has:

Pre-configured CI/CD pipelines
Security defaults baked in
Observability (metrics, logs, traces) wired up automatically
Deployment targets that are already hardened
Secret management that works without developers writing Vault clients

The golden path doesn't prevent teams from doing things differently. It just makes the right thing the easy thing. If a team wants to go off the path, they take on the responsibility for whatever the path was handling for them.

This is a better model than mandates. Mandates create resentment and workarounds. A good golden path creates adoption because it genuinely saves time.

What a Platform Team Actually Builds

Here's a concrete list of what my team owns or is building toward:

CI/CD templates. We maintain reusable pipeline templates (currently in Jenkins, migrating to GitHub Actions) that teams import rather than write from scratch. They include security scanning, test gates, and deployment steps. A team starting a new service shouldn't spend two weeks figuring out how to get their build working.

Kubernetes abstractions. We don't expose raw Kubernetes to every team. We have opinionated Helm chart templates and a thin deployment abstraction that handles resource requests, health checks, and pod disruption budgets. Teams specify what they need, not how Kubernetes works.

Secrets management. This is Vault-backed, with the Vault Agent Injector handling injection so apps don't write Vault clients. I've written about this separately.

Observability defaults. Every service that uses our golden path automatically emits metrics to Prometheus and logs in structured JSON to our log aggregator. The dashboards are pre-built. Teams can add custom metrics, but they get a useful baseline without any work.

Developer portal. We're evaluating Spotify's Backstage right now. It's new — just open-sourced in 2020 — and rough around the edges, but the concept is right: a single place where a developer can see all the services they own, their health, their documentation, and how to create a new one. Watch this space.

Security defaults in the developer workflow. One of the highest-leverage things a platform team can do is put security where developers can't accidentally miss it — in the commit workflow rather than in a manual review. We use detect-secrets as a pre-commit hook across all service repos. Every commit runs a scan for hardcoded credentials, API keys, private keys, and high-entropy strings before it hits the remote. When a developer accidentally adds a secret, they find out immediately at commit time, not three days later in a security review.

pip install pre-commit detect-secrets
pre-commit install

The baseline file (.secrets.baseline) is committed to the repo and updated intentionally when you add something that looks like a secret but isn't. The docker-based version keeps the toolchain consistent across machines:

docker run --rm -v $(pwd):/code ibmcom/detect-secrets:latest scan --update .secrets.baseline

Private module registry. For Go services, we proxy all dependencies through an internal Artifactory instance. Developers set GONOSUMCHECK and GOPROXY to point to the internal proxy, which caches public modules and also serves internal packages from github.ibm.com/ that would never be found via proxy.golang.org. Multi-arch container builds (AMD64 + S390x for IBM Z) are handled by the CI pipeline — teams don't configure this themselves.

The "You Build It You Run It" Tension

There's a real tension here that I don't want to paper over. "You build it you run it" is the right philosophy for ownership and accountability. But it assumes developers have the time, expertise, and tooling to run services in production. For most teams, that assumption isn't fully true.

Platform engineering is partly about making "you run it" realistic. If I hand a developer team a raw Kubernetes cluster and say "good luck," the cognitive overhead is enormous. If I give them abstractions, dashboards, runbooks, and alerting defaults, the cognitive overhead is much lower. The platform team absorbs the undifferentiated infrastructure work so product teams can focus on the product.

This is different from taking ownership away from product teams. The platform team does not own your service's incidents. You do. But we make it less hard to own them.

Platform Teams Succeed When They Treat Developers as Customers

This is my strong opinion: platform teams fail when they become gatekeepers, and succeed when they become enablers.

A gatekeeper platform team controls access to infrastructure, reviews every deployment, and becomes a bottleneck. It optimizes for safety and control. Teams route around it when they can.

An enabler platform team ships self-service tooling, provides clear documentation, has a fast feedback loop for requests, and measures success by developer satisfaction and time-to-deploy. It treats "developers had to ask us for help" as a product failure, not a sign of importance.

The practical test: if a developer can take a new service from zero to production without filing a ticket to your team, you're doing it right. If they need to go through you at every step, you've built a bottleneck with good intentions.

I'm not there yet. We're building toward it. But the mental model matters, because it shapes every decision about what to build and what to document and what to make self-service.