Most homelab monitoring templates on n8n do one thing: ping a URL and check if it's up. Uptime Kuma vibes. That's fine if all you care about is "is port 443 responding," but it tells you nothing about whether your host is slowly running out of disk, your Docker containers are restart-looping, or your memory has been at 94% for three days straight.
I wanted something that actually SSHes into the machine, pulls real system metrics, feeds them to an LLM for trend analysis, and sends me a morning digest I can glance at over coffee. So I built it.
The Architecture in Two Sentences
One n8n workflow, two execution paths. A daily digest runs at 7 AM and does the full collection-analysis-report pipeline. A critical alert path runs every five minutes, checks a handful of thresholds, and screams at me on Discord if something is on fire.
32 nodes total. 8 sticky notes (Creator Hub spec), 24 functional nodes plus 8 sticky notes. The daily path is the interesting one — the critical alert path is basically a glorified threshold check with a pretty Discord embed.
Starting Point: The Config Node
I follow the n8n Creator Hub pattern of putting all user-configurable values in a single Edit Fields node at the top of the workflow. SSH host, SSH credential reference, Discord webhook URL, Google Sheets ID for history, threshold values. One node to edit, and the workflow runs against your server.
This took five minutes.
SSH Commands: Where the Real Data Lives
The SSH node is doing the heavy lifting. I'm running a compound command that collects everything in one shot — hostname, uptime, load averages, memory breakdown, disk usage, inode counts, swap, CPU temperature, core count, and then a full Docker survey: container status, stats (CPU/memory per container), health check results, and restart counts.
The Docker metrics required some careful quoting. The docker stats --no-stream --format flag takes a Go template, and Go templates use {{.Names}} syntax. This is where the first real problem showed up, but I'll get to that.
The raw output from SSH is a wall of text. Newline-separated, mixed formats, some fields tab-delimited, others space-delimited. You cannot send this directly to an LLM and expect useful analysis. It needs parsing.
The 80-Line Code Node (Parsing and Normalization)
This was the most time-consuming node to write, and I'm not sure it needed to be. The Code node takes the raw SSH output and produces a clean JSON object with typed fields — memory as percentages, disk as percentages, each Docker container as an object with name/status/cpu/memory/health/restarts.
80 lines of JavaScript. String splitting, regex extraction, parseInt and parseFloat everywhere, defensive checks for missing fields. The kind of code that's tedious to write and critical to get right.
I briefly considered whether I should just throw the raw output at GPT-4o-mini and let it parse everything. I decided against it for two reasons: token cost (raw SSH output is verbose and repetitive), and reliability (the LLM shouldn't be guessing whether "Mem: 7821 1024 3842 12 2955 6484" means used or available). Deterministic parsing for structured data, LLM for interpretation. That split matters.
A trap I almost fell into: the Code node in n8n has two modes — runOnceForAllItems and runOnceForEachItem. The default is runOnceForAllItems, which gives you $input.all() but does NOT expose $json. I was referencing $json.stdout and getting undefined. Switching to runOnceForEachItem fixed it immediately, but I burned ten minutes staring at the node wondering why it was producing empty output. Silent failures.
Google Sheets for Historical Trends
The daily digest doesn't just report current state — it compares against the last 7 days of readings. I'm using Google Sheets as the trend store because it's free, it's visual (I can open the sheet and see charts if I want), and n8n has a native Google Sheets node.
Each run appends a row: timestamp, hostname, CPU load, memory percent, disk percent, top container by CPU, top container by memory, health score. Then the LLM prompt includes the last 7 rows so it can say things like "memory usage has increased 8% over the past week" or "disk is filling at roughly 2% per day — you have about 5 days before it crosses 90%."
The Google Sheets node bit me once. The range field — which tells the node which cells to read — was supposed to be a top-level parameter, not nested inside options. I had it inside options initially, which meant the node was silently reading... nothing. No error, just empty results. I only caught it because the LLM prompt was producing analysis that said "no historical data available" when I knew there were 5 rows in the sheet. One validation cycle with the n8n MCP tools flagged the misplaced parameter.
(Sidebar: I know Sheets as a database is cursed. I know. But for a 7-day rolling window with one row per day per server, it's perfectly fine. If you're monitoring 50 servers, use Postgres. For a homelab with 1-3 machines, a spreadsheet is the pragmatic choice, and it gives you free visualization.)
The LLM Prompt: Getting GPT-4o-mini to Produce Useful Analysis
This is the part where most AI-monitoring projects produce garbage. You throw metrics at an LLM and get back "Your system appears to be functioning normally. CPU usage is within acceptable parameters." Thanks for nothing.
The prompt took three iterations to get right.
Iteration 1 was a basic "analyze these metrics." The output was generic, restated the numbers without interpretation, and missed the trends entirely.
Iteration 2 added the historical data and explicit instructions: "Compare today's values against the 7-day trend. Identify anything that is worsening. Predict when thresholds will be breached at current rates." Better, but it was still hedging everything. "Memory usage may potentially be trending slightly upward."
Iteration 3 — the version I shipped — includes a scoring rubric and explicit output format. The prompt defines the 100-point health score system: start at 100, deduct for disk >80% (-10) or >90% (-25), memory >85% (-10) or >95% (-25), CPU sustained overload (-15), inode >80% (-10) or >90% (-20), container restarts in the last 24h (-5 each), containers in non-running state (-10 each). The LLM computes the score and formats everything into sections: Overall Health, System Resources, Docker Fleet, Trend Analysis, Action Items.
I genuinely did not expect GPT-4o-mini to handle the scoring rubric as reliably as it does. Every test run has produced a correctly computed score. Per run.
The Discord Embed: Color-Coded Health at a Glance
The Discord webhook receives a rich embed, not a wall of text. Color is based on the health score: green (0x2ECC71) for scores 80 and above, yellow (0xF39C12) for 50-79, red (0xE74C3C) below 50.
Fields are organized into inline embed fields — System Resources, Docker Status, Trend Summary, Action Items. The health score is the embed title in large text. You can glance at your phone, see green and "92/100," and move on with your day. Or see red and "43/100" and know you need to check something.
Building the embed JSON was straightforward. The HTTP Request node posts to the Discord webhook URL with the embed payload. Nothing clever here.
The Critical Alert Path: Every Five Minutes
The second execution path is simpler. Schedule Trigger every 5 minutes, SSH into the host with a lighter command (just disk, memory, CPU, and Docker container status — no history, no full stats), run it through a Code node that extracts the key numbers, and hit an IF node.
The IF node. This is where I spent an unreasonable amount of time on what should have been simple.
n8n's IF node v2.2 has a silent validation failure. If you configure the conditions correctly — say, {{ $json.diskPercent }} greater than 95 — the workflow will save, but validation throws "Missing required field conditions.options.version." The node schema requires a version field inside conditions.options, and it must be set to 2. This isn't documented anywhere I could find.
I had to upgrade to IF node typeVersion 2.3 and add "version": 2 inside the conditions options object. It looked something like:
{
"conditions": {
"options": {
"version": 2,
"caseSensitive": true,
"leftValue": ""
},
"combinator": "or",
"conditions": [
{
"leftValue": "={{ $json.diskPercent }}",
"rightValue": 95,
"operator": { "type": "number", "operation": "gt" }
}
]
}
}
I was ready to rip this node out entirely and use a Code node with a simple if statement. The only reason I didn't is because Code nodes aren't as readable in the workflow canvas — an IF node with "Is system critical?" as its name makes the flow self-documenting.
The dual-trigger problem. Both execution paths — daily digest and critical alerts — live in one workflow. Ideally they'd share the Config node so thresholds are defined once. But n8n Schedule Triggers each start their own execution, and a Config node downstream of Trigger A doesn't execute when Trigger B fires. You can't share a Set Fields node across two triggers in the same workflow.
My solution: the daily digest path uses the Config node. The critical alert path hardcodes its thresholds directly in the Code node with comments explaining where the values come from. Not beautiful, but explicit. And the critical alert path only checks three things (disk >95%, memory >95%, any container down), so hardcoding three numbers with a comment is actually clearer than adding a second Config node or splitting into two workflows.
Docker Go Templates vs. n8n Expression Detection
This one was funny. The SSH command to get Docker container stats uses Go template formatting:
docker stats --no-stream --format "{{.Names}}\t{{.CPUPerc}}\t{{.MemPerc}}"
n8n's workflow validator sees {{ and }} and thinks it's an n8n expression. Which means it tries to evaluate .Names as a variable reference and throws an "expression format error." The command works perfectly fine at runtime — SSH doesn't care about n8n's expression parser — but the validator flags it as a warning every time.
There is no clean fix. You can't escape the double braces in a way that satisfies both n8n's validator and the actual shell command. I left it as-is with a comment in the SSH node explaining that the validation warning is a false positive. The workflow runs correctly; the validator is just wrong about this one.
(This is the kind of thing that makes me wish n8n had a "raw string" mode for SSH command parameters. The expression detection is helpful 95% of the time and actively misleading for the other 5%.)
Sticky Notes and Creator Hub Compliance
Eight sticky notes total:
- Yellow overview (top-left): workflow name, how it works in 5 numbered steps, setup steps, customization tips. 200 words.
- Gray section: Configuration — covers the Config node
- Gray section: System metrics collection — covers the SSH and parse nodes
- Gray section: AI analysis — covers the Sheets read, prompt builder, and GPT node
- Gray section: Daily digest delivery — covers the Discord embed and Sheets append
- Gray section: Critical alerts — covers the 5-minute path
- Gray section: First-time setup — covers the auto-create Sheet path
- Purple warning on the GPT-4o-mini node: "Requires OpenAI API key. Estimated cost: ~$0.002-0.01 per daily run."
The cost estimate is real. GPT-4o-mini with the structured prompt and 7 days of history runs about 1,500-2,000 tokens input, 400-600 tokens output. At current pricing that's fractions of a cent.
The Health Score System
The 100-point scoring rubric lives in the LLM prompt, but here's the actual deduction table:
| Condition | Deduction |
|---|---|
| Disk usage > 80% | -10 |
| Disk usage > 90% | -25 |
| Memory usage > 85% | -10 |
| Memory usage > 95% | -25 |
| CPU load > cores (sustained) | -15 |
| Inode usage > 80% | -10 |
| Inode usage > 90% | -20 |
| Container restarts (24h) | -5 each |
| Container not running | -10 each |
| CPU utilization > 80% | -20 |
| CPU utilization > 50% | -10 |
| Swap usage > 80% | -10 |
| Swap usage > 50% | -5 |
| Zombie processes > 0 | -5 |
| Failed systemd services > 0 | -10 |
| Docker reclaimable > 50% | -5 |
A healthy single-server homelab typically scores 88-100. You'll see it dip into the 70s during heavy workloads. Below 50 means multiple things are wrong simultaneously. It worked.
Testing: The Boring Part That Matters
I tested the daily digest path by running it manually against my own server. The first run surfaced a parsing bug — the memory regex was matching "Mem:" but not "Mem :" (with a space before the colon) that some distros produce. Fixed the regex to handle both.
The critical alert path I tested by temporarily setting the disk threshold to 1% — which obviously triggers — and confirming the Discord alert fires with the right formatting. Then set it back to 95%.
End-to-end, the daily digest takes about 8 seconds to execute. Most of that is the SSH connection and the LLM call. The critical alert path runs in under 3 seconds.
What I'd Do Differently
Update: Several of these were addressed in a later redesign. The workflow now auto-creates a formatted Google Sheet via a first-time setup trigger (fixing the Sheets friction), the parse node was split into cleaner sections with dedicated handling for each metric category, and the AI now returns structured JSON with severity-tagged findings instead of free-form text.
I'd skip Google Sheets and use a JSON file. The Sheets integration adds a credential requirement (Google OAuth), adds latency, and adds a failure mode (Google API rate limits). A simple JSON file on disk — or even n8n's built-in static data — would work for a 7-day rolling window. I chose Sheets because I wanted the visual spreadsheet for debugging during development, but for a template other people will use, it's unnecessary friction.
I'd split the dual-trigger into two workflows. One for the daily digest, one for critical alerts. Share the SSH credential, duplicate the three threshold values. The architectural cleanliness of "one workflow, one concern" outweighs the minor duplication. The shared-workflow approach creates the dual-trigger problem I described above and makes the workflow harder to reason about.
I'd add a Prometheus endpoint instead of Discord-only. The health score, memory percentage, disk percentage — these are numbers that belong in a time-series database. A second output path that exposes metrics at an HTTP endpoint (n8n can do this with a Webhook node in "respond" mode) would let people feed this into Grafana alongside their other dashboards. Discord is great for notifications, but graphs are better for trends than a 7-day summary paragraph from an LLM.
The 80-line parse node should be two nodes. One for system metrics, one for Docker metrics. I combined them because I was in flow and didn't want to break the momentum, but it makes the workflow harder to modify. If someone only has Docker on one of their three servers, they'd have to carefully extract the Docker parsing from a single monolithic Code node instead of just disconnecting a Docker-specific node.
None of these are blockers. The workflow works. But if I were starting from scratch today, those four changes would make it a better template.