Blog
July 22, 2019 Marie H.

Custom IAM Roles for Cloud SQL in Terraform

Custom IAM Roles for Cloud SQL in Terraform

Photo by <a href="https://unsplash.com/@ozym?utm_source=cloudista&utm_medium=referral" target="_blank" rel="noopener">Osmany M Leyva Aldana</a> on <a href="https://unsplash.com/?utm_source=cloudista&utm_medium=referral" target="_blank" rel="noopener">Unsplash</a>

Custom IAM Roles for Cloud SQL in Terraform

GCP's predefined IAM roles are convenient and almost always too permissive for production service accounts. The one I run into most often is roles/cloudsql.admin. If you're building a service that needs to manage Cloud SQL users — create them, rotate passwords, delete old ones — it's tempting to reach for cloudsql.admin because it obviously covers what you need. It covers a lot more than what you need.

roles/cloudsql.admin grants the ability to create and delete Cloud SQL instances, import and export data, modify instance settings, manage SSL certificates, and more. If the service account running your user rotation job gets compromised, an attacker can delete your database instances with it. That's not a theoretical risk to accept for a service whose only legitimate operation is touching user records.

The fix is a custom role. Here's how we built one at a healthcare tech company I was working with in 2019.

The Permissions You Actually Need

For a service that only manages Cloud SQL users across multiple projects, the required permissions are:

  • cloudsql.instances.get and cloudsql.instances.list — to locate the instance the user belongs to
  • cloudsql.users.create, cloudsql.users.delete, cloudsql.users.list, cloudsql.users.update — the actual user management operations
  • resourcemanager.projects.get — to resolve project references when making API calls

That's seven permissions. roles/cloudsql.admin has dozens. The gap between "what the role grants" and "what the service actually needs" is your blast radius if something goes wrong. Custom roles close that gap.

Terraform for Multi-Project Custom Roles

Our setup managed Cloud SQL users across separate dev, staging, and production GCP projects. The same role definition needed to exist in all three. The Terraform for this:

resource "google_service_account" "cloud_sql_user_mgmt" {
  account_id   = "cloud-sql-user-mgmt"
  display_name = "Cloud SQL User Management"
  description  = "Rotates Cloud SQL database users across managed projects. Custom role grants user management only."
  project      = var.host_project
}

resource "google_project_iam_custom_role" "cloudsql_user_admin" {
  count       = length(var.managed_projects)
  role_id     = "cloudsqlUserAdmin"
  title       = "Cloud SQL User Admin"
  project     = var.managed_projects[count.index]
  permissions = [
    "cloudsql.instances.get",
    "cloudsql.instances.list",
    "cloudsql.users.create",
    "cloudsql.users.delete",
    "cloudsql.users.list",
    "cloudsql.users.update",
    "resourcemanager.projects.get"
  ]
}

resource "google_project_iam_member" "cloud_sql_user_mgmt" {
  count   = length(var.managed_projects)
  project = var.managed_projects[count.index]
  role    = google_project_iam_custom_role.cloudsql_user_admin[count.index].id
  member  = "serviceAccount:${google_service_account.cloud_sql_user_mgmt.email}"
}

With var.managed_projects = ["my-project-dev", "my-project-staging", "my-project-prod"], this creates the custom role in each project and binds the service account to it in each project. One resource block, consistent behavior everywhere.

The count.index on the iam_member resource references the custom role in the same project by index — the role in managed_projects[0] gets bound in managed_projects[0]. This keeps the pairing correct across all projects.

Service Account Hygiene

Use the description field on service accounts. When you have twenty-plus service accounts in a project — and in a mature GCP setup you will — a description that explains what the SA does and why it has the permissions it has is worth more than you'd expect. Six months later when you're auditing IAM, you'll want to know why svc-acct-07@project.iam.gserviceaccount.com has access to Cloud SQL in three projects. The description field is searchable in the console and comes back in gcloud iam service-accounts describe output.

Don't reuse service accounts across unrelated workloads. The user rotation service gets its own SA. The service that reads from Cloud SQL for the application gets a different SA with roles/cloudsql.client. Shared SAs mean shared blast radius.

The User Rotation Service

The service account above was used by a Python script running on Cloud Run, triggered by Cloud Scheduler on a weekly cadence. The script used the google-cloud-sqladmin client library:

from googleapiclient import discovery
from google.auth import default
import secrets
import string

def rotate_user_password(project_id, instance_name, username):
    credentials, _ = default()
    service = discovery.build('sqladmin', 'v1beta4', credentials=credentials)

    alphabet = string.ascii_letters + string.digits + string.punctuation
    new_password = ''.join(secrets.choice(alphabet) for _ in range(32))

    body = {'password': new_password}
    request = service.users().update(
        project=project_id,
        instance=instance_name,
        name=username,
        body=body
    )
    request.execute()
    return new_password

The script authenticated via the Cloud Run service identity — no JSON key file, no GOOGLE_APPLICATION_CREDENTIALS, just google.auth.default() picking up the metadata server credentials. The new passwords were written to Secret Manager immediately after rotation, and the applications pulled from Secret Manager on their next restart.

This whole flow only worked because the Cloud Run service identity was the cloud-sql-user-mgmt SA with the custom role. If we'd used a broader role, the rotation service would have had capabilities that made the security team uncomfortable. With the custom role, they could read the permission list, verify it matched what the service description said it did, and sign off on it.

Why Not Org-Level Custom Roles

GCP lets you define custom roles at the organization level, which would let you define cloudsqlUserAdmin once and use it across all projects. I've done this in other contexts and it has legitimate uses, but for this pattern I prefer project-level roles.

The reason is auditability. When a security auditor (or your own future self) looks at the IAM policy for a project, they can see the full picture: what roles exist in this project, what those roles contain, and who they're bound to. Organization-level custom roles require an extra lookup step — you see the role reference but not the permissions without checking the org-level role definition. Project-level roles are self-contained. The IAM policy for the project tells the whole story.

The Terraform count pattern eliminates the main operational argument for org-level roles (not wanting to define the same thing multiple times). You define it once in Terraform and it deploys to all managed projects.

The Broader Principle

For every service account, the question is: what is the minimum permission set this SA needs to accomplish its specific job? Build the custom role around the answer.

GCP's predefined roles are designed to cover common use cases broadly. cloudsql.admin, storage.admin, compute.admin — these are appropriate for a human administrator working interactively across a project. They're not appropriate for a service with a narrow, defined function. A service that reads objects from one GCS bucket doesn't need storage.objectAdmin on the project; it needs storage.objectViewer on that specific bucket.

The pattern is the same everywhere: identify the exact API calls your service makes, find the permissions that cover those calls, create a custom role with exactly those permissions, bind it as narrowly as possible (resource level when feasible, project level when not). It's more Terraform to write upfront. It's a smaller problem to deal with when something goes wrong.