Ansible for GCP Infrastructure Management

At Privia Health we used both Terraform and Ansible for GCP work, and the question of which to reach for was something we had to work out deliberately. The short version: Terraform for infrastructure state, Ansible for everything that happens inside (and on top of) that infrastructure. Here's how we used Ansible for GCP specifically.

The google.cloud Collection

The google.cloud Ansible collection provides modules that talk to GCP APIs. Install it:

ansible-galaxy collection install google.cloud

The core modules:

google.cloud.gcp_compute_instance — GCE VM instances
google.cloud.gcp_compute_address — static IP addresses
google.cloud.gcp_compute_firewall — firewall rules
google.cloud.gcp_storage_bucket — GCS buckets
google.cloud.gcp_sql_instance — Cloud SQL instances

Authentication

Two options: a service account JSON file, or Application Default Credentials.

Service account JSON file (works anywhere, explicit):

- name: Create GCS bucket
  google.cloud.gcp_storage_bucket:
    name: privia-app-assets
    project: privia-health-prod
    auth_kind: serviceaccount
    service_account_file: "{{ gcp_service_account_file }}"
    state: present

Application Default Credentials (works on GCE instances with the right service account attached, or where gcloud auth application-default login has been run):

- name: Create GCS bucket
  google.cloud.gcp_storage_bucket:
    name: privia-app-assets
    project: privia-health-prod
    auth_kind: application
    state: present

In CI/CD pipelines we used the service account file approach with the path stored in an Ansible vault-encrypted variable. On GCE VMs running Ansible against other GCP resources, we used auth_kind: application with the VM's attached service account.

Dynamic Inventory with gcp_compute

Static inventory files don't work well when VMs come and go with autoscaling. The gcp_compute inventory plugin solves this:

# inventory/gcp.yml
plugin: google.cloud.gcp_compute
projects:
  - privia-health-prod
regions:
  - us-east4
filters:
  - status = RUNNING
keyed_groups:
  - key: labels.role
    prefix: role
  - key: labels.environment
    prefix: env
hostnames:
  - name
auth_kind: serviceaccount
service_account_file: "{{ lookup('env', 'GCP_SERVICE_ACCOUNT_FILE') }}"

With this config, a VM labeled role: app-server and environment: production shows up in inventory groups role_app_server and env_production. Run a playbook against all production app servers:

ansible-playbook -i inventory/gcp.yml deploy_app.yml --limit env_production:&role_app_server

The & is Ansible's "intersection" operator — hosts that are in both groups. This pattern meant we never had to maintain a hosts file for the GCP environment.

Provisioning a GCE Instance with Firewall and Static IP

---
- name: Provision app server on GCP
  hosts: localhost
  gather_facts: false
  vars:
    project_id: privia-health-prod
    region: us-east4
    zone: us-east4-a
    auth_kind: serviceaccount
    service_account_file: "{{ gcp_service_account_file }}"

  tasks:
    - name: Create static IP address
      google.cloud.gcp_compute_address:
        name: privia-app-01-ip
        region: "{{ region }}"
        project: "{{ project_id }}"
        auth_kind: "{{ auth_kind }}"
        service_account_file: "{{ service_account_file }}"
        state: present
      register: app_ip

    - name: Create app server instance
      google.cloud.gcp_compute_instance:
        name: privia-app-01
        machine_type: n1-standard-2
        zone: "{{ zone }}"
        project: "{{ project_id }}"
        auth_kind: "{{ auth_kind }}"
        service_account_file: "{{ service_account_file }}"
        disks:
          - auto_delete: true
            boot: true
            initialize_params:
              source_image: projects/debian-cloud/global/images/family/debian-11
              disk_size_gb: 50
              disk_type: pd-ssd
        network_interfaces:
          - network:
              selfLink: "global/networks/privia-vpc"
            subnetwork:
              selfLink: "regions/{{ region }}/subnetworks/privia-app-subnet"
            access_configs:
              - name: External NAT
                nat_ip: "{{ app_ip }}"
                type: ONE_TO_ONE_NAT
        labels:
          role: app-server
          environment: production
          managed_by: ansible
        tags:
          items:
            - app-server
        state: present
      register: app_instance

    - name: Create firewall rule for app traffic
      google.cloud.gcp_compute_firewall:
        name: allow-app-https
        network:
          selfLink: "global/networks/privia-vpc"
        project: "{{ project_id }}"
        auth_kind: "{{ auth_kind }}"
        service_account_file: "{{ service_account_file }}"
        allowed:
          - ip_protocol: tcp
            ports:
              - "443"
              - "80"
        target_tags:
          - app-server
        source_ranges:
          - 0.0.0.0/0
        state: present

    - name: Print instance details
      debug:
        msg: "Instance {{ app_instance.name }} created with IP {{ app_ip.address }}"

Ansible vs. Terraform: Different Tools for Different Problems

We got this question a lot: "why use both?" The distinction I settled on:

Terraform is a state machine. It knows what resources exist (from the state file), computes a diff against what you've declared, and applies changes to converge toward the desired state. It's the right tool for infrastructure that has a lifecycle — things you create, modify, and eventually destroy. Drift detection and the plan/apply workflow are real advantages.

Ansible is idempotent task execution. It has no long-term state store — each run looks at the current state of the world and makes it match what you declared. It's the right tool for configuration management, application deployment, and operational tasks that run repeatedly against running systems.

Where they overlap is GCP provisioning — both can create a GCE instance. We chose Terraform for that because the state management is better for infrastructure. Ansible handled everything that happened after the VM existed: OS configuration, package installation, application deployment, service setup.

The gcloud Module for Gap Coverage

When no dedicated Ansible module existed for a GCP operation, we used the command module with gcloud:

- name: Enable Cloud SQL API
  command: >
    gcloud services enable sqladmin.googleapis.com
    --project={{ project_id }}
  register: result
  changed_when: "'already enabled' not in result.stderr"

Not elegant, but it works. The changed_when makes it idempotent — Ansible won't report a change if the API was already enabled.

Encrypting the Service Account JSON with ansible-vault

The service account JSON key file contains private key material and should never be committed to git in plaintext.

Encrypt the file:

ansible-vault encrypt privia-terraform-sa.json

Reference it in a playbook:

vars:
  gcp_service_account_file: "{{ vault_gcp_service_account_file }}"

Store the vault password in a file excluded from git (or use a vault password manager). Run playbooks with:

ansible-playbook -i inventory/gcp.yml provision.yml --vault-password-file ~/.vault_pass

We stored vault-encrypted files in git alongside the playbooks, and managed the vault password separately through our secrets manager. That way the repo was self-contained and auditable, without any cleartext credentials.