Connect the Homelab Backup Stack's Prometheus metrics to Grafana for visual backup health -- scores, ages, sizes, verification status -- alongside your existing infrastructure monitoring.
This guide assumes you're running both the backup stack and a Grafana/Prometheus setup (like the Homelab Monitoring Stack). If you're running a different Prometheus/Grafana installation, adapt the Docker Compose paths accordingly.
How the Data Flows
backup-metrics.sh (cron, every 5 min)
↓ writes
/var/lib/node_exporter/backup.prom (textfile on host)
↓ mounted into
Node Exporter (textfile collector)
↓ scraped by
Prometheus (every 15s)
↓ queried by
Grafana (dashboard panels)
The textfile collector is Node Exporter's mechanism for ingesting custom metrics. You write a .prom file in Prometheus exposition format, Node Exporter serves it on its /metrics endpoint, and Prometheus scrapes it like any other metric.
Step 1: Create the Textfile Directory
On your host machine:
sudo mkdir -p /var/lib/node_exporter
sudo chown $(whoami):$(whoami) /var/lib/node_exporter
Step 2: Enable the Textfile Collector in Node Exporter
Edit your monitoring stack's docker-compose.yml and update the node-exporter service:
node-exporter:
image: prom/node-exporter:v1.9.0
container_name: monitoring-node-exporter
command:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--path.rootfs=/host"
- "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
- "--collector.textfile.directory=/textfile" # ADD THIS LINE
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/host:ro,rslave
- /var/lib/node_exporter:/textfile:ro # ADD THIS LINE
Restart Node Exporter:
docker compose up -d node-exporter
Step 3: Set Up the Cron Export
Add a cron job that runs backup-metrics.sh every 5 minutes and writes the output to the textfile directory:
crontab -e
Add this line (adjust the path to your backup stack installation):
*/5 * * * * /opt/homelab-backup-stack/scripts/backup-metrics.sh > /var/lib/node_exporter/backup.prom 2>/dev/null
Run it once manually to verify:
/opt/homelab-backup-stack/scripts/backup-metrics.sh > /var/lib/node_exporter/backup.prom
cat /var/lib/node_exporter/backup.prom
You should see metrics like:
backup_repo_size_bytes 501010509
backup_repo_snapshots 8
backup_last_success_timestamp{service="nxsi-postgres",profile="database"} 1771463558
backup_health_score{service="nxsi-postgres",profile="database"} 100
backup_verify_last_result{service="nxsi-postgres",profile="database"} 1
Step 4: Verify Prometheus Is Scraping
Wait a minute for Prometheus to scrape, then check:
curl -s http://localhost:9090/api/v1/query?query=backup_health_score | python3 -m json.tool
If you see results with your service names, the pipeline is working.
You can also check the Prometheus UI at http://your-server:9090 -- type backup_ in the expression box and you should see autocomplete suggestions for all backup metrics.
Step 5: Create the Grafana Dashboard
Open Grafana (default: http://your-server:3000) and create a new dashboard. Add panels for each metric group below.
Panel 1: Backup Health Score (Gauge)
Shows each service's 0-100 health score as a colored gauge.
Query:
backup_health_score
Panel type: Gauge
Settings:
- Min: 0, Max: 100
- Thresholds: 0 = red, 50 = yellow, 80 = green
- Legend:
{{service}} - Title: "Backup Health Score"
Panel 2: Time Since Last Backup (Stat)
Shows how long ago each service was backed up. Stale backups stand out immediately.
Query:
backup_age_seconds
Panel type: Stat
Settings:
- Unit: seconds (s) -- Grafana auto-formats to "2h 15m" etc.
- Thresholds: 0 = green, 86400 = yellow (>24h), 172800 = red (>48h)
- Legend:
{{service}} - Title: "Time Since Last Backup"
Panel 3: Verification Status (Stat)
Shows PASS/FAIL/NEVER for each service's last restore verification.
Query:
backup_verify_last_result
Panel type: Stat
Settings:
- Value mappings: 1 = "PASS" (green), 0 = "FAIL" (red), -1 = "NEVER" (yellow)
- Legend:
{{service}} - Title: "Verification Status"
Panel 4: Backup Size per Service (Bar Chart)
Shows how much data each service's latest snapshot contains.
Query:
backup_last_size_bytes
Panel type: Bar chart
Settings:
- Unit: bytes (decbytes) -- Grafana auto-formats to MB/GB
- Legend:
{{service}} - Title: "Latest Backup Size"
Panel 5: Snapshot Count (Stat)
Total snapshots per service. Useful for spotting retention issues.
Query:
backup_snapshot_count
Panel type: Stat
Settings:
- Legend:
{{service}} - Title: "Snapshot Count"
Panel 6: Repository Total Size (Stat)
Single value showing total repository size across all snapshots after dedup.
Query:
backup_repo_size_bytes
Panel type: Stat
Settings:
- Unit: bytes (decbytes)
- Title: "Repository Size (after dedup)"
Panel 7: Backup Age Trend (Time Series)
Track backup freshness over time. Useful for catching cron failures -- the sawtooth pattern (age resets to 0 at each backup, then climbs) should be regular.
Query:
backup_age_seconds
Panel type: Time series
Settings:
- Unit: seconds
- Legend:
{{service}} - Title: "Backup Age Over Time"
Panel 8: Repository Growth (Time Series)
Track total repository size over time. Sudden jumps indicate new services or dedup failures. Steady growth is normal.
Query:
backup_repo_size_bytes
Panel type: Time series
Settings:
- Unit: bytes (decbytes)
- Title: "Repository Size Trend"
Step 6: Set Up Alerts (Optional)
If you want Grafana or Prometheus to alert on backup health, add an alert rule.
Prometheus Alert Rule
Add to your monitoring stack's prometheus/alerts/backup-alerts.yml:
groups:
- name: backup-alerts
rules:
- alert: BackupHealthLow
expr: backup_health_score < 50
for: 1h
labels:
severity: warning
annotations:
summary: "Backup health low for {{ $labels.service }}"
description: "{{ $labels.service }} backup health score is {{ $value }}/100"
- alert: BackupStale
expr: backup_age_seconds > 172800
for: 30m
labels:
severity: critical
annotations:
summary: "Backup stale for {{ $labels.service }}"
description: "{{ $labels.service }} hasn't been backed up in over 48 hours"
- alert: BackupVerifyFailed
expr: backup_verify_last_result == 0
for: 1h
labels:
severity: critical
annotations:
summary: "Backup verification failed for {{ $labels.service }}"
description: "{{ $labels.service }} backup failed restore verification"
Add the file to your Prometheus config's rule_files list and restart Prometheus:
docker compose restart prometheus
These alerts route through Alertmanager, so they'll hit your Discord/Slack/email -- the same notification channels you already configured for infrastructure alerts.
Dashboard Layout
A suggested layout for a single Grafana row:
┌─────────────────┬───────────────────┬────────────────────┐
│ Health Score │ Time Since Last │ Verify Status │
│ (Gauge) │ Backup (Stat) │ (Stat) │
├─────────────────┴───────────────────┴────────────────────┤
│ Latest Backup Size (Bar Chart) │ Snapshots / Repo │
│ │ Size (Stats) │
├─────────────────────────────────────┴────────────────────┤
│ Backup Age Trend (Time Series) │
├──────────────────────────────────────────────────────────┤
│ Repository Growth (Time Series) │
└──────────────────────────────────────────────────────────┘
Eight panels, one row. Place this below your existing System Overview or Docker Containers dashboard rows, or create a dedicated "Backups" dashboard.
Available Metrics Reference
All metrics exported by backup-metrics.sh:
| Metric | Type | Labels | Description |
|---|---|---|---|
backup_last_success_timestamp | gauge | service, profile | Unix timestamp of last successful backup |
backup_last_size_bytes | gauge | service, profile | Size of last backup snapshot in bytes |
backup_snapshot_count | gauge | service, profile | Total snapshots for this service |
backup_health_score | gauge | service, profile | Health score 0-100 |
backup_verify_last_result | gauge | service, profile | Last verify result: 1=pass, 0=fail, -1=never |
backup_age_seconds | gauge | service, profile | Seconds since last backup (-1 if never) |
backup_repo_size_bytes | gauge | — | Total repository size in bytes |
backup_repo_snapshots | gauge | — | Total snapshot count in repository |
This guide connects the Homelab Backup Stack with the Homelab Monitoring Stack. Both available at nxsi.io.