PowerShell Automation for Windows Server Fleets

This engagement was the most Windows-heavy environment I'd worked in: 50+ Windows Server 2016 and 2019 machines spread across US manufacturing facilities and several international sites. No Linux to speak of. If I wanted to automate anything — and I needed to automate a lot — it was going to be PowerShell remoting and Ansible's Windows modules. Here's what I learned doing it at scale.

PowerShell Remoting: The Foundation

Everything runs over PowerShell remoting. Before any remote command works, you need it enabled on each target machine:

Enable-PSRemoting -Force

On a domain machine this usually just works. For non-domain machines — common in manufacturing environments where plant-floor servers often aren't domain-joined — you have a TrustedHosts problem. WinRM won't connect to machines that aren't in the same domain unless the source machine explicitly trusts them:

Set-Item WSMan:\localhost\Client\TrustedHosts -Value "server-01,server-02,10.10.1.*" -Force

You can use a wildcard (*) to trust everything, but that's only acceptable in an isolated management network. Do it right and specify the list.

On transport: WinRM defaults to HTTP on port 5985. That's fine on an internal management network, but if your remoting traffic crosses any boundary you care about, configure HTTPS (port 5986) or switch to SSH transport (available since PowerShell 6). For this engagement — non-domain machines, US and international sites — NTLM over HTTPS was the pragmatic choice. Kerberos requires a domain. SSH requires OpenSSH to be deployed everywhere. NTLM over HTTPS requires neither.

Running Commands Across a Fleet

Invoke-Command with a -ComputerName list runs your script block against all targets in parallel, up to the ThrottleLimit (default 32):

$servers = Get-Content servers.txt

$results = Invoke-Command -ComputerName $servers -ScriptBlock {
    [PSCustomObject]@{
        Hostname = $env:COMPUTERNAME
        Disks    = Get-Disk | Select-Object Number, @{N='SizeGB';E={[math]::Round($_.Size / 1GB, 2)}}, OperationalStatus
        Services = Get-Service | Where-Object Status -eq 'Running' | Select-Object Name, StartType
    }
}

$results | ConvertTo-Json -Depth 5 | Out-File "facts\fleet-snapshot.json"

The fact that it runs in parallel matters at scale. Running this against 50 servers takes roughly the same time as running it against 5. The results come back as a flat list of objects, each with a PSComputerName property so you know which server produced which result.

For per-server output files, write inside the loop after the fact:

$results | Group-Object PSComputerName | ForEach-Object {
    $_.Group | ConvertTo-Json -Depth 5 | Out-File "facts\$($_.Name).json"
}

This pattern — run remote discovery, collect results locally, write JSON files — was the basis for all my pre-migration documentation. After running it I had a directory of per-server JSON files I could process with Python to produce the DR runbook spreadsheet.

Hardware Inventory with Get-CimInstance

For disk inventory, Get-CimInstance is the modern replacement for Get-WmiObject. The class you want for physical disks is MSFT_Disk (via Win32_DiskDrive for legacy info, or the Storage module's Get-Disk/Get-Partition/Get-Volume stack for the full picture):

$diskInfo = Invoke-Command -ComputerName $servers -ScriptBlock {
    Get-Disk | ForEach-Object {
        $disk = $_
        $partitions = $disk | Get-Partition | Where-Object { $_.DriveLetter }
        [PSCustomObject]@{
            DiskNumber        = $disk.Number
            SizeGB            = [math]::Round($disk.Size / 1GB, 2)
            OperationalStatus = $disk.OperationalStatus
            DriveLetters      = ($partitions | Select-Object -ExpandProperty DriveLetter) -join ','
        }
    }
}

The size conversion — [math]::Round($disk.Size / 1GB, 2) — comes up constantly. Disk sizes come back in bytes. The / 1GB operator (PowerShell understands KB, MB, GB, TB as literals) converts cleanly.

Firewall Enumeration

Before making infrastructure changes, I documented the firewall state on every machine. This becomes your reference when you're building AWS security groups to match:

$fwRules = Invoke-Command -ComputerName $servers -ScriptBlock {
    Get-NetFirewallRule | Where-Object Enabled -eq True | ForEach-Object {
        $rule = $_
        $portFilter = $rule | Get-NetFirewallPortFilter
        [PSCustomObject]@{
            Name      = $rule.DisplayName
            Direction = $rule.Direction
            Action    = $rule.Action
            Protocol  = $portFilter.Protocol
            LocalPort = $portFilter.LocalPort
        }
    }
}

Get-NetFirewallRule alone gives you rule metadata. You have to pipe each rule through Get-NetFirewallPortFilter to get the actual port numbers. This is slower than you'd expect — on a server with 200+ rules it takes a few seconds per machine — but you're only doing it once for documentation purposes.

Error Handling

PowerShell's default error behavior will surprise you if you're coming from bash. Non-terminating errors — the kind most cmdlets throw when something goes wrong — don't stop execution and don't set a useful exit code. Your script will happily continue past a failed step and exit 0.

Fix this at the top of every script:

$ErrorActionPreference = 'Stop'

This converts non-terminating errors to terminating errors, meaning a failed cmdlet throws an exception. Then wrap the meaningful sections in try/catch:

$ErrorActionPreference = 'Stop'

try {
    $result = Invoke-Command -ComputerName $server -ScriptBlock {
        Get-Disk | Select-Object Number, OperationalStatus
    }
} catch {
    Write-Error "Failed on $server`: $_"
    exit 1
}

Without $ErrorActionPreference = 'Stop', a failed Invoke-Command due to a WinRM timeout writes to the error stream but returns nothing to $result — and your script keeps going as if that's fine.

Ansible's win_powershell Module

For the agent installation and configuration work, I drove everything from Ansible. The ansible.windows.win_powershell module is how you run arbitrary PowerShell from a playbook:

- name: Download and install replication agent
  ansible.windows.win_powershell:
    script: |
      $ErrorActionPreference = 'Stop'
      $url = "{{ installer_url }}"
      Invoke-WebRequest -URI $url -OutFile "C:\Temp\installer.exe"
      & "C:\Temp\installer.exe" --region {{ aws_region }} --no-prompt
  register: install_output

The register variable gets back an object with three keys: output (stdout lines as a list), error (stderr lines), and result (the return value if you explicitly set $Ansible.Result). For debugging a failed run, check install_output.error.

One thing that trips people up: the script block runs in a new PowerShell process each time. Variables you set in one win_powershell task don't persist to the next. If you need to pass data between tasks, write it to a file or use set_fact on the Ansible side based on the output you collected.

Elevation with become

Many tasks — service control, software installation, firewall changes — need Administrator privileges. In Ansible, this is the become pattern for Windows:

- name: Install Windows feature
  ansible.windows.win_feature:
    name: Web-Server
    state: present
  become: yes
  become_method: runas
  become_user: Administrator

This requires the WinRM connection user to have the right to run as Administrator. In practice, on these non-domain machines, I was connecting as a local administrator account and becoming Administrator — which worked but required SeImpersonatePrivilege. Test this early. The error message when it fails isn't always obvious about why.

WinRM Credentials

The credential handling problem deserves direct treatment. WinRM basic auth over plain HTTP sends credentials in base64. Don't use it in production. Your options in roughly increasing order of setup complexity:

HTTPS + basic auth: Generate a self-signed cert, configure WinRM to use it, set ansible_winrm_server_cert_validation: ignore in your inventory. Simple, works for non-domain machines.
HTTPS + NTLM: Add ansible_winrm_transport: ntlm. Works without a domain. Marginally more secure challenge-response negotiation.
Kerberos: The right answer for domain environments. More setup (install python-kerberos on the Ansible control node, configure /etc/krb5.conf), but credentials aren't transmitted at all.

For this engagement — non-domain machines, mixed US and international sites — NTLM over HTTPS was what we used throughout. Kerberos would have been ideal if the machines were domain-joined, but they weren't.

What the Automation Covered

By the end of the engagement, the playbooks and scripts covered: full disk and partition inventory per server, running service and startup type snapshot, firewall rule enumeration, agent installation, agent health verification, and firewall rule updates. Running the full discovery suite across all 50+ servers took about 20 minutes. Doing it by hand — logging into each machine, running commands, copying output — would have taken days and produced inconsistent results.

PowerShell remoting is genuinely good infrastructure tooling once you get past the WinRM configuration friction. $ErrorActionPreference = 'Stop' first, always.