Blog
November 14, 2019 Marie H.

PowerShell Fleet Automation from a Linux Bastion

PowerShell Fleet Automation from a Linux Bastion

PowerShell Fleet Automation from a Linux Bastion

I have a confession: the first thing I do when I sit down at a Windows box is install Cygwin. I grew up on Linux — it's where I think clearly, where my muscle memory lives, where I reach instinctively when something needs to be automated. A Windows desktop without a proper terminal feels like trying to work in oven mitts.

This creates an interesting problem when your job is automating a few hundred Windows servers in a customer's VMware environment with no existing configuration management, no Ansible tower, and no budget timeline for building one. The Windows admins are comfortable in PowerShell. I'm comfortable in bash. The servers aren't reachable from the internet. The deadline is not comfortable.

My solution: spin up a Linux VM in their VMware environment, use it as the automation bastion, and drive everything from there via WinRM and Ansible's Windows modules. Write as much of the logic as possible in bash and Python on the Linux side; push only what has to be PowerShell down to the Windows hosts.

This post is about how that worked in practice.

The Setup: Linux Bastion in a Windows World

The VMware environment had several clusters with Windows Server 2012–2016 VMs across multiple subnets. WinRM (Windows Remote Management) was already enabled on most hosts — it's on by default in newer Windows Server versions and was something the Windows team had enabled years earlier for ad-hoc remote management. That was the wire I needed.

I deployed an Ubuntu 20.04 VM into their management VLAN. Network access to the Windows hosts on WinRM ports (5985 HTTP, 5986 HTTPS) was already permitted between management and workload VLANs. I installed Ansible, Python, and a handful of utilities, pointed it at a dynamic inventory built from their CMDB export, and had a working orchestration platform in an afternoon.

The Linux bastion gave me:
- Ansible with ansible.windows for structured automation with retries, inventory management, and idempotent state
- Python for pre/post processing, report generation, and anything needing complex logic
- Bash for orchestrating multi-step workflows, parallel execution, and the kind of glue code that would be awkward in PowerShell
- Git for version-controlling everything, including the PowerShell scripts we pushed to hosts
- pssh / parallel-ssh for the rare cases where I needed raw parallel shell access without Ansible overhead

The Windows hosts ran PowerShell. That was their job — execute what the bastion told them to execute, report results, exit cleanly.

Inventory from the CMDB

The customer had a CMDB that was loosely accurate. I wrote a Python script to pull a CSV export and generate an Ansible inventory:

#!/usr/bin/env python3
import csv
import json
import sys

def cmdb_to_inventory(csv_path):
    inventory = {
        "_meta": {"hostvars": {}},
        "windows": {"hosts": [], "vars": {
            "ansible_connection": "winrm",
            "ansible_winrm_transport": "ntlm",
            "ansible_winrm_server_cert_validation": "ignore",
            "ansible_port": 5985
        }},
        "windows_iis": {"hosts": []},
        "windows_sql": {"hosts": []},
    }

    with open(csv_path) as f:
        for row in csv.DictReader(f):
            if row['os_type'] != 'Windows':
                continue
            hostname = row['hostname'].strip().lower()
            ip = row['ip_address'].strip()

            inventory['windows']['hosts'].append(hostname)
            inventory['_meta']['hostvars'][hostname] = {
                "ansible_host": ip,
                "env": row.get('environment', 'unknown'),
                "role": row.get('role', 'unknown'),
            }

            if 'IIS' in row.get('role', ''):
                inventory['windows_iis']['hosts'].append(hostname)
            if 'SQL' in row.get('role', ''):
                inventory['windows_sql']['hosts'].append(hostname)

    return inventory

if __name__ == '__main__':
    print(json.dumps(cmdb_to_inventory(sys.argv[1]), indent=2))
python3 cmdb_to_inventory.py servers.csv > inventory.json
ansible -i inventory.json windows -m win_ping

The win_ping sweep was always the first thing I ran against a new environment. It told me which hosts were actually reachable and which CMDB entries were stale. In this environment: about 12% of the inventory was gone — decommissioned servers that nobody had removed from the CMDB. Better to find that out with win_ping than mid-deployment.

WinRM Authentication

The customer used NTLM authentication (domain environment, no Kerberos configured for remote management, which is common in mid-sized shops). The Ansible winrm connection with ntlm transport worked fine from Linux once I had pywinrm and requests_ntlm installed:

pip3 install pywinrm requests-ntlm

For the credentials themselves, I used Ansible Vault rather than plaintext variables:

ansible-vault create group_vars/windows/vault.yml
# ansible_user: DOMAIN\svcaccount
# ansible_password: <password>

Then the playbooks use --ask-vault-pass or a vault password file. This is basic hygiene but worth stating: storing domain credentials in plaintext in a repo is a category of mistake that haunts environments for years.

The Core Pattern: Ansible Wrapper, PowerShell Payload

Most tasks followed the same structure: Ansible handles targeting, retry logic, and result collection; PowerShell does the Windows-specific work.

A typical playbook for collecting system state across the fleet:

---
- name: Collect Windows host inventory
  hosts: windows
  gather_facts: false

  tasks:
    - name: Get OS and hardware info
      ansible.windows.win_powershell:
        script: |
          $os = Get-CimInstance Win32_OperatingSystem
          $cs = Get-CimInstance Win32_ComputerSystem
          $disks = Get-CimInstance Win32_LogicalDisk | Where-Object {$_.DriveType -eq 3}

          @{
            hostname     = $env:COMPUTERNAME
            os_name      = $os.Caption
            os_build     = $os.BuildNumber
            total_ram_gb = [math]::Round($cs.TotalPhysicalMemory / 1GB, 1)
            cpu_cores    = $cs.NumberOfLogicalProcessors
            disks        = $disks | ForEach-Object {
              @{
                drive       = $_.DeviceID
                size_gb     = [math]::Round($_.Size / 1GB, 1)
                free_gb     = [math]::Round($_.FreeSpace / 1GB, 1)
                free_pct    = [math]::Round(($_.FreeSpace / $_.Size) * 100, 1)
              }
            }
          } | ConvertTo-Json -Depth 3
      register: host_info

    - name: Save result
      ansible.builtin.copy:
        content: "{{ host_info.output[0] }}"
        dest: "/tmp/inventory/{{ inventory_hostname }}.json"
      delegate_to: localhost

Running this across 200 hosts took about 4 minutes with forks: 50 in ansible.cfg. The results landed in /tmp/inventory/ as JSON files on the Linux bastion, where I could process them with jq, Python, or feed them into reports.

This is the pattern I kept coming back to: collect JSON from PowerShell, process it on Linux. PowerShell's ConvertTo-Json is genuinely good. The Linux side is where I'm faster and where the tooling for data processing is richer.

Parallel Execution for Operational Tasks

For tasks that needed to run fast across large groups — patching checks, service restarts, disk space alerts — I used Ansible's async + poll: 0 to fire-and-forget and then gather results:

- name: Trigger Windows Update check on all hosts
  hosts: windows
  gather_facts: false

  tasks:
    - name: Check for available updates (async)
      ansible.windows.win_powershell:
        script: |
          $session = New-Object -ComObject Microsoft.Update.Session
          $searcher = $session.CreateUpdateSearcher()
          $result = $searcher.Search("IsInstalled=0 and Type='Software'")
          @{
            pending_updates = $result.Updates.Count
            titles = $result.Updates | ForEach-Object { $_.Title }
          } | ConvertTo-Json
      async: 300
      poll: 0
      register: update_check_job

    - name: Wait for update checks to complete
      ansible.builtin.async_status:
        jid: "{{ update_check_job.ansible_job_id }}"
      register: job_result
      until: job_result.finished
      retries: 30
      delay: 10

The Windows Update COM object is slow — checking for updates on a single host can take 30–90 seconds. Running it synchronously across 200 hosts would serialize into an hours-long operation. Async brought it down to roughly the time it took the slowest host.

Bash Orchestration for Multi-Stage Workflows

Some workflows were too procedural for a single Ansible play — things like "drain this server from the load balancer, apply updates, verify it came back up, add it back." I wrote these as bash scripts on the bastion that called Ansible as a subprocess:

#!/bin/bash
# rolling_patch.sh — apply patches to a group with LB drain/restore

set -euo pipefail

HOSTS_FILE=$1
BATCH_SIZE=${2:-5}

if [[ ! -f "$HOSTS_FILE" ]]; then
  echo "Usage: $0 <hosts_file> [batch_size]"
  exit 1
fi

mapfile -t ALL_HOSTS < "$HOSTS_FILE"
TOTAL=${#ALL_HOSTS[@]}
echo "Starting rolling patch: $TOTAL hosts, batch size $BATCH_SIZE"

for ((i=0; i<TOTAL; i+=BATCH_SIZE)); do
  BATCH=("${ALL_HOSTS[@]:$i:$BATCH_SIZE}")
  BATCH_STR=$(IFS=,; echo "${BATCH[*]}")
  echo ""
  echo "── Batch $((i/BATCH_SIZE + 1)): ${BATCH[*]}"

  echo "  Draining from load balancer..."
  ansible -i inventory.json "$BATCH_STR" \
    -m ansible.windows.win_powershell \
    -a "script='Set-HostLBState -Hostname $env:COMPUTERNAME -State Drain'" \
    --become

  echo "  Applying patches..."
  ansible-playbook -i inventory.json apply_patches.yml \
    --limit "$BATCH_STR" \
    --extra-vars "reboot_after=true"

  echo "  Verifying health..."
  ansible -i inventory.json "$BATCH_STR" \
    -m ansible.windows.win_powershell \
    -a "script='Test-ServiceHealth'" \
    | grep -E "(FAILED|unreachable|ok)" || true

  echo "  Restoring to load balancer..."
  ansible -i inventory.json "$BATCH_STR" \
    -m ansible.windows.win_powershell \
    -a "script='Set-HostLBState -Hostname $env:COMPUTERNAME -State Active'" \
    --become

  echo "  Batch complete. Sleeping 30s before next batch..."
  sleep 30
done

echo ""
echo "Rolling patch complete."

This is exactly the kind of script that's natural to write in bash but awkward in PowerShell — it orchestrates Ansible runs, captures exit codes, handles batching, logs progress to stdout where I can watch it in a tmux session. The actual Windows work happens inside Ansible; the bash is just the conductor.

When I Had to Go Full PowerShell

Some things genuinely required native PowerShell with no good Ansible wrapper — usually anything involving COM objects, registry depth, or Windows-specific APIs that ansible.windows hadn't abstracted. For those I'd write a .ps1 file on the bastion, push it to the host, run it, and collect the results:

- name: Push and run diagnostic script
  block:
    - name: Copy script to host
      ansible.windows.win_copy:
        src: scripts/collect_wmi_deep.ps1
        dest: C:\temp\collect_wmi_deep.ps1

    - name: Execute script
      ansible.windows.win_shell: |
        C:\temp\collect_wmi_deep.ps1 | Out-File C:\temp\wmi_results.json
      register: script_run

    - name: Fetch results
      ansible.windows.win_fetch:
        src: C:\temp\wmi_results.json
        dest: /tmp/results/{{ inventory_hostname }}_wmi.json
        flat: true

    - name: Cleanup
      ansible.windows.win_file:
        path: C:\temp\collect_wmi_deep.ps1
        state: absent

The fetch-and-process-on-Linux pattern meant I didn't have to write data analysis in PowerShell. Push the data collection logic to Windows, pull the JSON back to Linux, process with Python. Each side does what it's best at.

The Cygwin Confession

I mentioned Cygwin at the top, and it's worth saying more. Part of why the Linux bastion approach worked so well here was that it encoded a philosophy I've always operated with: the tools you're fastest with are the right tools, and it's worth investing in getting your preferred tools into whatever environment you're working in rather than adapting to the environment's defaults.

On Windows desktops that I had to work from directly — jumping onto a server for manual diagnosis — I'd install Cygwin and be functional in minutes. grep, awk, curl, ssh, proper tab completion, a sane shell. The Windows team found this either impressive or baffling depending on who you asked. One of the senior Windows admins watched me grep through an IIS log with a regex from a bash prompt on his own server and asked if I'd broken something.

The Linux bastion was just the same instinct at infrastructure scale: rather than adapt my entire workflow to Windows PowerShell (which is genuinely capable, but not where my fluency is), I moved the environment closer to how I work. The Windows servers ran their workloads. The Linux bastion ran the automation. That division of responsibility was clean, and it worked.

What I'd Do Differently

WinRM over HTTP on port 5985 is fine inside a secure management VLAN. I'd push harder for HTTPS (5986) with certificate auth in environments where the management traffic traverses less-trusted segments. NTLM over cleartext is not something I'd accept outside a tightly controlled VLAN.

The CMDB-generated inventory was a constant source of friction — stale entries, incorrect IPs, missing role tags. In a longer engagement I'd invest more in building a live inventory from Active Directory or from the hypervisor's own API (vSphere has a good one) rather than a periodic CSV export.

The bash orchestration scripts grew organically and got messy. A proper Ansible role structure with task files and proper variable management would have been cleaner for anything that ran more than a few times. Bash is great for glue; it doesn't scale well as the primary automation layer once the workflow complexity grows.

But the core pattern — Linux bastion, Ansible with win_powershell, JSON in and out, process results on Linux — I'd use again without hesitation. It got a fleet of Windows servers into a managed state in a fraction of the time it would have taken to build native PowerShell remoting infrastructure, and it let me work in the environment where I'm most effective.