The Wake-Up Call # On April 9, Gilfoyle (my AI network admin) posted this at midnight:
The cert flap resolved itself within hours. Gilfoyle posted the recovery notice, and ccode closed the escalation. No lasting impact.
The Problem: Nobody’s Watching at 3 AM # My homelab runs 47 guests across 4 Proxmox nodes, with HA pairs for DNS and reverse proxy, a Wazuh XDR deployment, centralized logging in Graylog, and CI/CD automation through Semaphore. It’s a lot of infrastructure for one person to monitor.
I had alerts. Grafana fires when RAM hits 75%. Wazuh flags suspicious file changes. n8n emails me when workflows fail. But alerts are reactive. They tell you something broke. They don’t tell you something is about to break.
The Problem: Six Interfaces for One Question # “Is anything broken in my homelab?”
Answering that question used to mean: SSH into Proxmox to check guest status. Curl the Pi-hole API for DNS health. Open Grafana to scan Prometheus alerts. Check Graylog for error spikes. Look at Semaphore for failed automation runs. Glance at Caddy logs for 502s.
The Problem # My Caddy reverse proxy runs as an HA pair – two nodes behind a keepalived VIP. Every service in the homelab gets its traffic through this pair. The setup works great, except for one recurring failure mode: config drift.
The deployment process was manual: edit the Caddy site config in git, SCP it to both nodes, validate, reload. The “both nodes” part is where things break down. It’s easy to deploy to caddy1, test it, see it working, and then forget caddy2 exists. Until keepalived fails over and suddenly half your sites return 502s because the backup node has last week’s config.
The Problem # My PAN-OS firewall (GlobalProtect VPN portal at vpn.mareoxlan.com) needs a valid TLS certificate. I had a dedicated LXC (30122) running acme.sh with a Cloudflare DNS-01 challenge to issue a wildcard cert, then a PAN-OS deploy hook to push it to the firewall via the XML API. It worked, but it was a single-purpose VM doing the same job my Caddy reverse proxy already does – Caddy auto-renews *.mareoxlan.com via the same Cloudflare DNS-01 mechanism.
Overview # After multiple outages caused by configuration drift between two HA Caddy reverse proxy nodes, I built a GitOps pipeline that automatically deploys configs to both nodes whenever changes are pushed to the main branch. Config drift is now impossible by design.
The problem: Two Caddy nodes in a keepalived HA pair need identical configs. Forgetting to deploy to the second node after editing a site config caused service outages — twice in the same week.
The Problem # After months of building Claude Code extensions (agents, skills, commands, hooks, MCP servers) I had a growing collection of powerful tools with no coherent entry point. Want to pull all repos? Run a shell script. Want to check infrastructure health? Ask Claude and hope it knows which command to use. Want to automate a browser task? Figure out whether to use the MCP plugin or write a script.
Overview # Migrated the Prometheus + Grafana monitoring stack from a shared Docker VM to a dedicated LXC container. The shared VM hosted multiple stacks (pgAdmin, Portainer, monitoring) which created resource contention and made lifecycle management messy. Moving monitoring to its own LXC follows the homelab pattern of one service per container for cleaner isolation, backups, and management.
Overview # When running a self-hosted password manager like Vaultwarden, accurate client IP logging is critical for security alerts. The “New Device Login” email should show the actual IP address of whoever just accessed your vault—not your reverse proxy’s internal IP.
This becomes tricky when you have multiple traffic paths: external users coming through Cloudflare Tunnel, and internal users coming through your local reverse proxy. Each path uses different mechanisms to communicate the real client IP.
Overview # This site uses a hybrid content structure that combines the best of wikis and blogs. Instead of choosing between “everything chronological” or “everything by topic,” we get both:
Wiki sections for evergreen reference content (organized by topic) Blog posts for journey updates and lessons learned (organized by date) Tutorials for step-by-step guides (standalone, searchable) Series for multi-part deep dives (linked learning paths) Content Architecture #