Blog
May 9, 2022 Marie H.

Ansible Roles: Structure and Best Practices in 2022

Ansible Roles: Structure and Best Practices in 2022

Ansible Roles: Structure and Best Practices in 2022

I've written and maintained Ansible roles across more environments than I can easily count at this point — GCP instances, AWS EC2, on-prem VMware, bare metal. The roles that have held up over years of maintenance all share the same structural habits. The ones that become liabilities tend to violate the same patterns in predictable ways. This is what I actually do when I write roles now.

The Directory Structure and What Goes Where

A full role looks like this:

roles/my-role/
├── tasks/
│   ├── main.yml
│   ├── Debian.yml
│   └── RedHat.yml
├── handlers/
│   └── main.yml
├── defaults/
│   └── main.yml
├── vars/
│   └── main.yml
├── templates/
│   └── my-config.conf.j2
├── files/
│   └── static-script.sh
├── meta/
│   └── main.yml
└── molecule/
    └── default/
        ├── converge.yml
        └── verify.yml

tasks/main.yml is the entry point for task execution. handlers/main.yml contains handlers — tasks that run once at the end of a play when notified. templates/ holds Jinja2 templates rendered with variable substitution. files/ holds static files copied verbatim. Everything in defaults/ and vars/ is variable definitions.

defaults vs vars: Get This Right

This is the one that causes the most confusion for people new to role development, and getting it wrong breaks the override semantics.

defaults/main.yml has the lowest priority in Ansible's variable precedence. Anything in inventory variables, host variables, group variables, or extra vars overrides it. This is where you put everything that a role user might legitimately want to change:

# defaults/main.yml
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_server_name: "{{ inventory_hostname }}"
nginx_ssl_enabled: false

vars/main.yml has much higher priority — it overrides most inventory variables. This is for internal role constants that are not part of the public interface. If your role needs a hardcoded path to a binary or an internal mapping of OS families to package names, that's a vars/ variable. Users shouldn't be changing it, and the high priority means an inventory variable won't silently override an internal constant.

If you're unsure which to use, put it in defaults. Erring toward overridability is almost always the right call.

meta/main.yml and Role Dependencies

The meta/main.yml file handles Galaxy metadata (author, license, platforms) and, most consequentially, role dependencies:

# meta/main.yml
galaxy_info:
  author: Marie H.
  description: Install and configure nginx
  license: MIT
  min_ansible_version: "2.12"
  platforms:
    - name: Ubuntu
      versions: ["focal", "jammy"]
    - name: EL
      versions: ["8", "9"]

dependencies:
  - role: common-base
    vars:
      base_packages_extra: ["curl", "ca-certificates"]

Dependencies declared here are applied automatically before the role runs. This seems convenient until you're debugging a playbook and can't figure out why a task you didn't explicitly call is running. I use role dependencies in meta/ sparingly. For most cases I'd rather have explicit include_role calls in a playbook than implicit dependencies. Explicit is easier to trace and easier to audit.

Idempotency Is the Contract

Every task in a role should be safe to run repeatedly. This isn't just a nice property — it's the contract that makes Ansible useful for configuration management rather than just provisioning. If you can't re-run a playbook without fear of side effects, you can't use it for ongoing drift correction.

The built-in modules are generally idempotent by default. ansible.builtin.copy checks the file hash before copying. ansible.builtin.template does the same. ansible.builtin.package is idempotent. Where you can break idempotency is with shell commands.

Bad:

- name: Initialize the database
  ansible.builtin.shell: /usr/bin/db-init

This runs every time. Better:

- name: Check if database is initialized
  ansible.builtin.stat:
    path: /var/db/.initialized
  register: db_initialized

- name: Initialize the database
  ansible.builtin.shell: /usr/bin/db-init && touch /var/db/.initialized
  when: not db_initialized.stat.exists

Or better still, use a module that handles the idempotency natively. If shell or command is necessary, always pair it with a creates parameter or a when guard based on a prior stat check.

For services, use state: started not state: restarted. Started is idempotent — it does nothing if the service is already running. Restarted runs every time, causing unnecessary downtime.

Handler Patterns

Handlers are tasks that run once at the end of a play, regardless of how many times they were notified during the play. This is perfect for service restarts:

# tasks/main.yml
- name: Deploy nginx configuration
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify: Reload nginx

- name: Deploy nginx virtual host
  ansible.builtin.template:
    src: vhost.conf.j2
    dest: /etc/nginx/sites-enabled/myapp.conf
  notify: Reload nginx
# handlers/main.yml
- name: Reload nginx
  ansible.builtin.service:
    name: nginx
    state: reloaded

Even if both templates change in the same play, nginx only reloads once at the end. This is the correct behavior.

The flush_handlers meta task breaks the "at end of play" rule intentionally:

- name: Deploy application config
  ansible.builtin.template:
    src: app.conf.j2
    dest: /etc/myapp/app.conf
  notify: Restart myapp

- name: Flush handlers
  ansible.builtin.meta: flush_handlers

- name: Run database migration
  ansible.builtin.command: /opt/myapp/bin/migrate

Here I need the application to restart before the migration runs. flush_handlers executes all pending handlers immediately. Use it when you have a genuine ordering dependency between a handler and a subsequent task.

Error Handling With block/rescue/always

Not every task needs error handling, but operations that acquire resources or modify external state can leave things in a bad state if they fail partway through. block/rescue/always handles this:

- name: Mount and configure external volume
  block:
    - name: Mount the volume
      ansible.builtin.mount:
        path: /mnt/data
        src: /dev/sdb1
        fstype: ext4
        state: mounted

    - name: Run data migration
      ansible.builtin.command: /opt/bin/migrate-data --source /mnt/data

  rescue:
    - name: Unmount volume on failure
      ansible.builtin.mount:
        path: /mnt/data
        state: unmounted
      ignore_errors: true

    - name: Log failure
      ansible.builtin.debug:
        msg: "Data migration failed, volume unmounted"

  always:
    - name: Record attempt in audit log
      ansible.builtin.lineinfile:
        path: /var/log/migration-audit.log
        line: "Migration attempted at {{ ansible_date_time.iso8601 }}"
        create: true

rescue runs only on failure, always runs regardless. I reach for this pattern for operations that need cleanup on failure — mounts, temp directories, partial writes — not for routine tasks.

Tagging

Tags let you run subsets of a playbook. I use functional tags consistently:

- name: Install nginx packages
  ansible.builtin.package:
    name: nginx
    state: present
  tags: [packages]

- name: Deploy nginx configuration
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  tags: [configuration]

- name: Harden nginx configuration
  ansible.builtin.lineinfile:
    path: /etc/nginx/nginx.conf
    line: "server_tokens off;"
  tags: [security, configuration]

Running ansible-playbook site.yml --tags configuration applies only config tasks. Running --tags security applies only security-tagged tasks. This is useful for partial runs during incidents and for CI pipelines that only need to verify a specific concern.

The never tag is special — tasks tagged never are skipped unless explicitly called:

- name: Reset application data (destructive)
  ansible.builtin.file:
    path: /var/myapp/data
    state: absent
  tags: [never, reset]

This task never runs in a normal playbook run. You have to explicitly pass --tags reset to trigger it. It's a good pattern for destructive or one-off operations you want to keep in the role definition but never run accidentally.

Cross-Cloud Role Design

When a role needs to work on both Ubuntu (my GCP and AWS instances) and RHEL (on-prem VMware), I split the OS-specific logic into separate task files:

# tasks/main.yml
- name: Include OS-specific tasks
  ansible.builtin.include_tasks: "{{ ansible_os_family }}.yml"

- name: Common configuration (all OS families)
  ansible.builtin.template:
    src: myapp.conf.j2
    dest: /etc/myapp/myapp.conf
  notify: Restart myapp
# tasks/Debian.yml
- name: Install packages (Debian/Ubuntu)
  ansible.builtin.apt:
    name: "{{ myapp_packages }}"
    state: present
    update_cache: true
# tasks/RedHat.yml
- name: Install packages (RHEL/CentOS)
  ansible.builtin.dnf:
    name: "{{ myapp_packages }}"
    state: present

ansible_os_family is a fact that Ansible sets automatically — Debian for Ubuntu and Debian systems, RedHat for RHEL, CentOS, and Fedora. The include_tasks call resolves to Debian.yml or RedHat.yml at runtime. Common tasks that work on both OSes go in main.yml after the include.

Testing With Molecule

A role without tests is a role you're afraid to change. Molecule is the standard tool for role testing.

molecule init role my-role
cd my-role
molecule test

The default converge.yml applies the role against a test container. The verify.yml runs assertions:

# molecule/default/verify.yml
- name: Verify
  hosts: all
  tasks:
    - name: Check nginx is running
      ansible.builtin.service_facts:

    - name: Assert nginx is active
      ansible.builtin.assert:
        that:
          - "'nginx' in services"
          - "services['nginx'].state == 'running'"

    - name: Check nginx config is valid
      ansible.builtin.command: nginx -t
      changed_when: false

I use the Docker driver for local development — fast feedback, no cloud resources needed. For CI, the same molecule test command runs on every PR to the role repository. The full molecule test sequence runs create, converge, verify, and destroy in sequence, leaving nothing behind.

The time investment in Molecule pays off every time you refactor a role and the tests catch a regression before it hits any real infrastructure.