Connect the Homelab Backup Stack's Prometheus metrics to Grafana for visual backup health -- scores, ages, sizes, verification status -- alongside your existing infrastructure monitoring.

This guide assumes you're running both the backup stack and a Grafana/Prometheus setup (like the Homelab Monitoring Stack). If you're running a different Prometheus/Grafana installation, adapt the Docker Compose paths accordingly.

How the Data Flows

backup-metrics.sh (cron, every 5 min)
    ↓ writes
/var/lib/node_exporter/backup.prom (textfile on host)
    ↓ mounted into
Node Exporter (textfile collector)
    ↓ scraped by
Prometheus (every 15s)
    ↓ queried by
Grafana (dashboard panels)

The textfile collector is Node Exporter's mechanism for ingesting custom metrics. You write a .prom file in Prometheus exposition format, Node Exporter serves it on its /metrics endpoint, and Prometheus scrapes it like any other metric.

Step 1: Create the Textfile Directory

On your host machine:

sudo mkdir -p /var/lib/node_exporter
sudo chown $(whoami):$(whoami) /var/lib/node_exporter

Step 2: Enable the Textfile Collector in Node Exporter

Edit your monitoring stack's docker-compose.yml and update the node-exporter service:

node-exporter:
  image: prom/node-exporter:v1.9.0
  container_name: monitoring-node-exporter
  command:
    - "--path.procfs=/host/proc"
    - "--path.sysfs=/host/sys"
    - "--path.rootfs=/host"
    - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
    - "--collector.textfile.directory=/textfile"    # ADD THIS LINE
  volumes:
    - /proc:/host/proc:ro
    - /sys:/host/sys:ro
    - /:/host:ro,rslave
    - /var/lib/node_exporter:/textfile:ro           # ADD THIS LINE

Restart Node Exporter:

docker compose up -d node-exporter

Step 3: Set Up the Cron Export

Add a cron job that runs backup-metrics.sh every 5 minutes and writes the output to the textfile directory:

crontab -e

Add this line (adjust the path to your backup stack installation):

*/5 * * * * /opt/homelab-backup-stack/scripts/backup-metrics.sh > /var/lib/node_exporter/backup.prom 2>/dev/null

Run it once manually to verify:

/opt/homelab-backup-stack/scripts/backup-metrics.sh > /var/lib/node_exporter/backup.prom
cat /var/lib/node_exporter/backup.prom

You should see metrics like:

backup_repo_size_bytes 501010509
backup_repo_snapshots 8
backup_last_success_timestamp{service="nxsi-postgres",profile="database"} 1771463558
backup_health_score{service="nxsi-postgres",profile="database"} 100
backup_verify_last_result{service="nxsi-postgres",profile="database"} 1

Step 4: Verify Prometheus Is Scraping

Wait a minute for Prometheus to scrape, then check:

curl -s http://localhost:9090/api/v1/query?query=backup_health_score | python3 -m json.tool

If you see results with your service names, the pipeline is working.

You can also check the Prometheus UI at http://your-server:9090 -- type backup_ in the expression box and you should see autocomplete suggestions for all backup metrics.

Step 5: Create the Grafana Dashboard

Open Grafana (default: http://your-server:3000) and create a new dashboard. Add panels for each metric group below.

Panel 1: Backup Health Score (Gauge)

Shows each service's 0-100 health score as a colored gauge.

Query:

backup_health_score

Panel type: Gauge

Settings:

Min: 0, Max: 100
Thresholds: 0 = red, 50 = yellow, 80 = green
Legend: {{service}}
Title: "Backup Health Score"

Panel 2: Time Since Last Backup (Stat)

Shows how long ago each service was backed up. Stale backups stand out immediately.

Query:

backup_age_seconds

Panel type: Stat

Settings:

Unit: seconds (s) -- Grafana auto-formats to "2h 15m" etc.
Thresholds: 0 = green, 86400 = yellow (>24h), 172800 = red (>48h)
Legend: {{service}}
Title: "Time Since Last Backup"

Panel 3: Verification Status (Stat)

Shows PASS/FAIL/NEVER for each service's last restore verification.

Query:

backup_verify_last_result

Panel type: Stat

Settings:

Value mappings: 1 = "PASS" (green), 0 = "FAIL" (red), -1 = "NEVER" (yellow)
Legend: {{service}}
Title: "Verification Status"

Panel 4: Backup Size per Service (Bar Chart)

Shows how much data each service's latest snapshot contains.

Query:

backup_last_size_bytes

Panel type: Bar chart

Settings:

Unit: bytes (decbytes) -- Grafana auto-formats to MB/GB
Legend: {{service}}
Title: "Latest Backup Size"

Panel 5: Snapshot Count (Stat)

Total snapshots per service. Useful for spotting retention issues.

Query:

backup_snapshot_count

Panel type: Stat

Settings:

Legend: {{service}}
Title: "Snapshot Count"

Panel 6: Repository Total Size (Stat)

Single value showing total repository size across all snapshots after dedup.

Query:

backup_repo_size_bytes

Panel type: Stat

Settings:

Unit: bytes (decbytes)
Title: "Repository Size (after dedup)"

Panel 7: Backup Age Trend (Time Series)

Track backup freshness over time. Useful for catching cron failures -- the sawtooth pattern (age resets to 0 at each backup, then climbs) should be regular.

Query:

backup_age_seconds

Panel type: Time series

Settings:

Unit: seconds
Legend: {{service}}
Title: "Backup Age Over Time"

Panel 8: Repository Growth (Time Series)

Track total repository size over time. Sudden jumps indicate new services or dedup failures. Steady growth is normal.

Query:

backup_repo_size_bytes

Panel type: Time series

Settings:

Unit: bytes (decbytes)
Title: "Repository Size Trend"

Step 6: Set Up Alerts (Optional)

If you want Grafana or Prometheus to alert on backup health, add an alert rule.

Prometheus Alert Rule

Add to your monitoring stack's prometheus/alerts/backup-alerts.yml:

groups:
  - name: backup-alerts
    rules:
      - alert: BackupHealthLow
        expr: backup_health_score < 50
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Backup health low for {{ $labels.service }}"
          description: "{{ $labels.service }} backup health score is {{ $value }}/100"

      - alert: BackupStale
        expr: backup_age_seconds > 172800
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: "Backup stale for {{ $labels.service }}"
          description: "{{ $labels.service }} hasn't been backed up in over 48 hours"

      - alert: BackupVerifyFailed
        expr: backup_verify_last_result == 0
        for: 1h
        labels:
          severity: critical
        annotations:
          summary: "Backup verification failed for {{ $labels.service }}"
          description: "{{ $labels.service }} backup failed restore verification"

Add the file to your Prometheus config's rule_files list and restart Prometheus:

docker compose restart prometheus

These alerts route through Alertmanager, so they'll hit your Discord/Slack/email -- the same notification channels you already configured for infrastructure alerts.

Dashboard Layout

A suggested layout for a single Grafana row:

┌─────────────────┬───────────────────┬────────────────────┐
│  Health Score   │  Time Since Last  │  Verify Status     │
│  (Gauge)        │  Backup (Stat)    │  (Stat)            │
├─────────────────┴───────────────────┴────────────────────┤
│  Latest Backup Size (Bar Chart)     │  Snapshots / Repo  │
│                                     │  Size (Stats)      │
├─────────────────────────────────────┴────────────────────┤
│  Backup Age Trend (Time Series)                          │
├──────────────────────────────────────────────────────────┤
│  Repository Growth (Time Series)                         │
└──────────────────────────────────────────────────────────┘

Eight panels, one row. Place this below your existing System Overview or Docker Containers dashboard rows, or create a dedicated "Backups" dashboard.

Available Metrics Reference

All metrics exported by backup-metrics.sh:

Metric	Type	Labels	Description
`backup_last_success_timestamp`	gauge	service, profile	Unix timestamp of last successful backup
`backup_last_size_bytes`	gauge	service, profile	Size of last backup snapshot in bytes
`backup_snapshot_count`	gauge	service, profile	Total snapshots for this service
`backup_health_score`	gauge	service, profile	Health score 0-100
`backup_verify_last_result`	gauge	service, profile	Last verify result: 1=pass, 0=fail, -1=never
`backup_age_seconds`	gauge	service, profile	Seconds since last backup (-1 if never)
`backup_repo_size_bytes`	gauge	—	Total repository size in bytes
`backup_repo_snapshots`	gauge	—	Total snapshot count in repository

This guide connects the Homelab Backup Stack with the Homelab Monitoring Stack. Both available at nxsi.io.

GrafanaBackupDashboard

How the Data Flows

Step 1: Create the Textfile Directory

Step 2: Enable the Textfile Collector in Node Exporter

Step 3: Set Up the Cron Export

Step 4: Verify Prometheus Is Scraping

Step 5: Create the Grafana Dashboard

Panel 1: Backup Health Score (Gauge)

Panel 2: Time Since Last Backup (Stat)

Panel 3: Verification Status (Stat)

Panel 4: Backup Size per Service (Bar Chart)

Panel 5: Snapshot Count (Stat)

Panel 6: Repository Total Size (Stat)

Panel 7: Backup Age Trend (Time Series)

Panel 8: Repository Growth (Time Series)

Step 6: Set Up Alerts (Optional)

Prometheus Alert Rule

Dashboard Layout

Available Metrics Reference

Resources

Used In