How to Monitor Your Homelab Server with n8n and AI (Step-by-Step Setup)
My morning homelab check used to be: SSH in, run htop, glance at disk, close the terminal, wonder if I missed anything. The answer, eventually, was always yes. I missed the container that had been restart-looping for three days. I missed memory trending from 65% to 88% over a week because any single snapshot looked fine.
This workflow fixes that. It SSHes into your server every morning, collects full system and Docker metrics, pulls 7 days of history from a Google Sheet, sends everything to GPT-4o-mini for trend analysis, and drops a color-coded health digest in Discord. A second path runs every 5 minutes, checks critical thresholds with no AI, and fires an immediate alert if anything is actually on fire.
32 nodes total across three execution paths. Under a dollar a month in API costs. Nothing to install on the monitored server.
What You're Building
Three execution paths in one workflow, plus 8 sticky notes for documentation:
Daily digest (13 nodes) — runs at 7 AM:
Schedule Trigger → Config → SSH: system metrics → SSH: Docker stats
→ Code: parse + health score → Google Sheets: read history
→ Code: build prompt → LLM Chain (GPT-4o-mini) → Code: format embed
→ HTTP Request: Discord → Google Sheets: append row
Collects 30+ metrics per run: real CPU utilization % (via /proc/stat delta sampling, not just load average), all mounted filesystems, network I/O, top 5 processes by CPU, zombie process count, failed systemd services, Docker ecosystem health (disk waste, dangling images, reclaimable space), and the usual memory/swap/container stats. Logs 16 columns to Google Sheets for trend history.
Critical alerts (7 nodes) — runs every 5 minutes:
Schedule Trigger → SSH: quick check → Code: threshold check
→ IF: is critical? → Code: format alert → HTTP Request: Discord
First-time setup (4 nodes) — run once manually:
Manual Trigger → Config → Code: build sheet schema
→ Google Sheets: create formatted sheet
One-click creates a premium formatted Google Sheet with all 16 column headers, dark header row, frozen top row, and conditional formatting gradients for health score, CPU%, memory%, disk%, inode%, and swap%. Run this once, copy the Sheet ID, and you never touch the sheet structure again.
The critical alert path has no AI. Fast, cheap, and direct. The setup path runs once and saves you the manual spreadsheet work.
Prerequisites
Before importing the workflow, have these ready:
- n8n (self-hosted, v1.60+). Works on Docker, bare metal, or n8n Cloud.
- SSH access to your target server with private key authentication. Password auth is not supported by the SSH node.
- OpenAI API key — GPT-4o-mini at roughly $0.01-0.03 per daily digest. Set up billing on your OpenAI account; the free tier limits are too low.
- Discord webhook URL — Server Settings → Integrations → Webhooks → New Webhook. Takes two minutes.
- Google Sheets API credentials — OAuth2 or a Service Account. A Service Account is less annoying to maintain.
Step 1: Import the Workflow
In n8n, go to Workflows → Import from File and select homelab-health-dashboard.json.
The workflow opens with both execution paths visible on the canvas, labeled with gray section sticky notes. The yellow sticky note in the top-left is the overview.
Don't activate it yet. Finish credentials and configuration first — an active workflow with missing credentials will fail silently on schedule.
Step 2: SSH Credentials
In n8n, go to Settings → Credentials → New → SSH.
Set the authentication type to Private Key. Paste your private key. Set the host to your server's IP or hostname.
Two things that will trip you up:
The SSH user needs Docker access. The workflow runs docker stats and docker ps over SSH. If the user isn't in the docker group, those commands return a permission error that the parse Code node can't handle gracefully. Fix:
sudo usermod -aG docker your-ssh-user
# User must log out and back in for the group to take effect
Non-interactive shells may not have Docker in PATH. The SSH node opens a non-interactive shell that doesn't source .bashrc. If docker works when you SSH in manually but fails in the workflow, the binary isn't on the default PATH. Prefix the Docker commands with:
export PATH=$PATH:/usr/bin:/usr/local/bin && docker stats --no-stream ...
Or use the full path: /usr/bin/docker instead of docker.
Step 3: OpenAI Credentials
Go to Settings → Credentials → New → OpenAI API. Paste your API key.
GPT-4o-mini handles the analysis. The prompt sends current metrics plus 7 days of history — typically 1,500-2,000 tokens input, 400-600 tokens output. At current pricing that's fractions of a cent per run. Per run.
You can swap to Claude or a local Ollama model after initial setup. The CUSTOMIZATION.md in the product kit covers this.
Step 4: Google Sheets Credentials and Auto-Create the Sheet
Add credentials first.
Go to Settings → Credentials → New → Google Sheets API. Choose either OAuth2 or Service Account.
For Service Account: download the JSON key from Google Cloud Console, paste the private key and email into n8n. Then share your spreadsheet with the service account email ([email protected]). This is the step people forget — the service account can't access a sheet it hasn't been explicitly shared with.
Run the first-time setup path.
Instead of manually creating columns and formatting, the workflow does it for you. Click the Manual Trigger node on the first-time setup path and run it. This creates a premium formatted metrics sheet with all 16 column headers:
date | hostname | cpu_percent | cpu_load | memory_percent | disk_percent
inode_percent | swap_percent | health_score | containers_running
containers_total | net_rx_bytes | net_tx_bytes | docker_reclaimable_gb
zombie_count | connections
The setup path also applies dark header formatting, freezes the top row, and adds conditional formatting gradients — green-to-red scales on health score, CPU%, memory%, disk%, inode%, and swap% columns. You get a sheet that looks like a proper dashboard from the first row of data.
Copy the Sheet ID from the new spreadsheet's URL — it's the long string between /d/ and /edit:
https://docs.google.com/spreadsheets/d/THIS_IS_YOUR_SHEET_ID/edit
You'll need this ID in the next step.
The sheet name metrics matters. The workflow's Google Sheets nodes reference it by name. If you rename it to Metrics or Sheet1, you'll get a "Sheet not found" error that looks like a credential problem but isn't.
Step 5: Configure the Workflow
Open the ⚙️ Configure monitoring settings node (Edit Fields node near the top of the daily digest path).
Set these values:
| Field | Value |
|---|---|
discord_webhook_url | Your full Discord webhook URL |
google_sheet_id | The Sheet ID from step 4 |
alert_disk_threshold | Default: 90 |
alert_memory_threshold | Default: 95 |
alert_cpu_multiplier | Default: 1.5 (triggers if load average > 1.5x core count) |
The SSH host is configured in the SSH credential (Step 2), not in this config node.
The critical alert path hardcodes its own thresholds directly in the threshold Code node because n8n's dual-trigger architecture doesn't allow two Schedule Triggers to share a single Config node — each trigger starts its own execution, and the Config node only runs on one path. The values there match the defaults above. If you change the config node thresholds, update the critical alert Code node to match.
Step 6: Wire Credentials to Nodes
This is the part that trips people up when they first import a workflow template. Credentials don't auto-assign — you have to select them on each node that needs one.
Open each of these nodes and select your credential from the dropdown:
SSH nodes (select your SSH credential):
Collect system metricsCollect Docker container metricsQuick system health check
Google Sheets nodes (select your Google Sheets credential):
Read 7 days of metric historyAppend today's metrics
OpenAI model node (select your OpenAI credential):
- Inside the
Generate daily health digestLLM Chain, there is a sub-node for the GPT-4o-mini model. Click it and set your credential there.
Finding the OpenAI model node catches people out. It's nested inside the LLM Chain node, not a standalone node on the canvas.
Step 7: Test Before Activating
Click Test Workflow on the daily digest path. This runs the workflow once from the Schedule Trigger through to Discord without activating the schedule.
Watch for these failure points:
SSH fails immediately — credential mismatch or host unreachable. Check the credential, confirm the host is accessible, try ssh -i /path/to/key user@host manually.
Docker commands return empty — the SSH user doesn't have Docker access. See Step 2.
Google Sheets returns "Sheet not found" — sheet name is not exactly metrics, or the service account email hasn't been shared on the spreadsheet.
Parse Code node produces empty output — this is almost always the runOnceForEachItem mode issue. The Code nodes in this workflow (Parse metrics + calculate health score, Build AI analysis prompt, Format Discord health dashboard) need to be set to runOnceForEachItem mode. In the default runOnceForAllItems mode, $json is not available — you'd need $input.all() instead. If the nodes came through correctly from the JSON import, this is already set. If you manually recreated any Code node, check the mode.
How to check: Click a Code node → look for "Run once for each item" toggle in the right panel. It should be enabled on these three nodes.
Discord webhook returns 404 — the webhook was deleted. Create a new one and update the config node.
First run produces "no historical data" — this is expected and fine. The Google Sheet is empty on the first run. The LLM handles this gracefully and produces a current-state analysis without trend comparisons. After 3-7 days of data, trend analysis becomes meaningful.
Test the critical alert path separately by temporarily setting the disk threshold in the threshold Code node to 1 — it will trigger on any server — then set it back after confirming the Discord alert fires correctly.
Step 8: Activate
Once the manual test runs clean, toggle the workflow active.
Activate, don't just save. Schedule Triggers only fire when the workflow is active. A saved but inactive workflow does nothing on schedule. I mention this because I've stared at a workflow that "should have run" for ten minutes before remembering this.
After activating, check the execution log (left sidebar → Executions) after the scheduled run time to confirm everything executed without errors.
Understanding the Health Score
The daily digest includes a 0-100 health score. It's deduction-based — starts at 100, subtracts points for bad conditions:
| Condition | Deduction |
|---|---|
| Disk > 90% | -30 |
| Disk > 80% | -15 |
| Memory > 95% | -25 |
| Memory > 85% | -10 |
| CPU % > 80% | -20 |
| CPU % > 50% | -10 |
| CPU load > 1.5x cores | -10 |
| CPU load > 1x cores | -5 |
| Inodes > 90% | -20 |
| Inodes > 80% | -10 |
| Swap > 80% | -10 |
| Swap > 50% | -5 |
| Container restarts > 5 (24h) | -15 |
| Container restarts > 0 (24h) | -5 |
| Each container not running | -10 |
| Zombie processes > 0 | -5 |
| Failed systemd services > 0 | -10 |
| Docker reclaimable > 50% | -5 |
The CPU % deductions use real CPU utilization measured via /proc/stat delta sampling (a 1-second sleep between two reads). This is actual core utilization, not load average — a key distinction. The load average deductions are separate and catch sustained queuing pressure. Swap, zombie processes, failed services, and Docker disk waste are new factors that catch problems most homelab monitoring setups miss entirely.
Discord embed colors: green (80+), yellow (50-79), red (below 50). A healthy single-server homelab typically scores 88-100. Dropping into the 70s during a heavy backup job is normal. Below 50 means multiple things are wrong at once.
The score is computed in the Parse metrics + calculate health score Code node, not by the LLM. Deterministic math for the number, LLM for the interpretation.
A Note on the n8n Validator Warnings
When you import the workflow, the n8n workflow validator may flag warnings on the SSH nodes. These are false positives.
The Docker stats command uses Go template syntax to format its output:
docker stats --no-stream --format "{{.Names}}\t{{.CPUPerc}}\t{{.MemPerc}}"
The {{.Names}} syntax looks like n8n expression syntax to the validator. It tries to evaluate .Names as a variable and fails. The SSH node sends this string directly to the remote shell — n8n's expression parser never touches it at runtime. The command works. Ignore the warning.
If you see a validation error about conditions.options.version on the IF node, that's a typeVersion issue. The IF node in this workflow is at v2.3 with version: 2 set in the conditions options. If you're on an older n8n and the import downgrades the node, that's the error. Upgrade n8n to v1.60+ to resolve it.
What to Customize
Change the digest schedule. Open the Run daily health check Schedule Trigger and change the cron expression. 7 AM is the default. If your server is in a different timezone than your n8n instance, account for the offset — n8n cron runs in the timezone configured in n8n's environment settings.
Add more servers. The workflow monitors one server per instance. Duplicate the workflow, change the SSH host in the Config node, and point it at a different Discord channel (or the same one — a hostname field is included in every digest so you can tell them apart).
Swap the notification target. The delivery node is a plain HTTP Request with a JSON payload. Change the URL and body format for Telegram, Slack, ntfy, or any other webhook target. The AI-generated digest is plain text — reformatting it for a different webhook takes about five minutes.
Replace GPT-4o-mini. The LLM Chain accepts any model n8n's AI nodes support. In the Generate daily health digest node, replace the OpenAI model sub-node with an Ollama node pointing at a local model. Running cost drops to zero. Quality varies by model — llama3.1:8b handles the prompt well, smaller models get flaky on the scoring rubric.
Add inode alerting to critical checks. The default critical alert path checks disk, memory, CPU, and container status. Inode exhaustion is the silent killer of Linux servers — you can have plenty of disk space but no inodes left and no files can be written. Add an inode check to the threshold Code node:
if (inodePercent >= 90) {
issues.push({
metric: 'inodes',
value: inodePercent,
threshold: 90,
fix: 'find / -xdev -printf "%h\\n" | sort | uniq -c | sort -k 1 -n | tail -10'
});
}
Extend history beyond 7 days. The Google Sheets read node fetches the last 7 rows. Change the range to read more rows if you want the LLM to analyze longer trends. At 30 rows, you'll hit the context window limit for GPT-4o-mini with this prompt size — either summarize older data or increase the model's context budget.
Skip the Setup
The complete workflow JSON is available free at nxsi.io/store/homelab-health-dashboard. Import it, fill in credentials, and you're running.
The product kit includes the workflow JSON, this setup documentation, a troubleshooting guide, and a customization guide for swapping LLM providers, notification targets, and moving from Google Sheets to a Postgres database.
Get the Homelab Health Dashboard
Built and documented by Dyllan at nxsi.io. The SSH commands, Code node gotchas, IF node typeVersion fix, and Google Sheets range parameter trap all come from the real build process. Every warning in this guide tripped me up at least once.