The Problem: Six Interfaces for One Question # “Is anything broken in my homelab?”
Answering that question used to mean: SSH into Proxmox to check guest status. Curl the Pi-hole API for DNS health. Open Grafana to scan Prometheus alerts. Check Graylog for error spikes. Look at Semaphore for failed automation runs. Glance at Caddy logs for 502s.
The 2 AM Wake-Up Call # I woke up to find my CI/CD platform had been down for 8 hours. Semaphore, the Ansible automation engine that manages my entire homelab, was stuck in a crash loop:
1 2 3 /usr/local/bin/server-wrapper: line 295: syntax error: unexpected "&&" /usr/local/bin/server-wrapper: line 295: syntax error: unexpected "&&" /usr/local/bin/server-wrapper: line 295: syntax error: unexpected "&&" The same error, repeating every few seconds. The container would start, hit the broken entrypoint script, crash, and restart. Endlessly.
The Problem # My Caddy reverse proxy runs as an HA pair – two nodes behind a keepalived VIP. Every service in the homelab gets its traffic through this pair. The setup works great, except for one recurring failure mode: config drift.
The deployment process was manual: edit the Caddy site config in git, SCP it to both nodes, validate, reload. The “both nodes” part is where things break down. It’s easy to deploy to caddy1, test it, see it working, and then forget caddy2 exists. Until keepalived fails over and suddenly half your sites return 502s because the backup node has last week’s config.
The Problem # My PAN-OS firewall (GlobalProtect VPN portal at vpn.mareoxlan.com) needs a valid TLS certificate. I had a dedicated LXC (30122) running acme.sh with a Cloudflare DNS-01 challenge to issue a wildcard cert, then a PAN-OS deploy hook to push it to the firewall via the XML API. It worked, but it was a single-purpose VM doing the same job my Caddy reverse proxy already does – Caddy auto-renews *.mareoxlan.com via the same Cloudflare DNS-01 mechanism.
The Goal # Add all 4 Proxmox VE cluster nodes (pve-mini2, pve-mini3, pve-mini5, pve-mini6) to the existing Prometheus/Grafana stack on LXC 30194. The monitoring stack already covered Graylog, Windows desktop, and PAN-OS firewall metrics – Proxmox was the last major gap.
Approach: pve-exporter vs node_exporter # I evaluated two options:
The Problem # Watchtower had been my go-to for automatic Docker container updates across 8+ services. It worked… mostly. But I kept running into issues:
Opt-out model is dangerous - Watchtower watches ALL containers by default. I had to remember to add com.centurylinklabs.watchtower.enable=false to containers I didn’t want updated. Forgetting meant surprise updates.
No visibility - Updates happened silently at 4 AM. I only knew something updated when it broke. No dashboard, no easy way to see pending updates.
Overview # I migrated my internet speed monitoring from Speedtest Tracker to MySpeed after learning that Speedtest Tracker was deprecating native Discord notifications. Rather than adding an Apprise sidecar container for notifications, I opted for MySpeed which has built-in Discord support.
Why Migrate? # Factor Speedtest Tracker MySpeed Stack Laravel/PHP Node.js Discord Deprecated (needs Apprise) Native support Complexity nginx + php-fpm + SQLite Single Docker container Updates Manual WUD opt-in monitoring Architecture Comparison #