I Built a Docker-Aware Backup Stack with Automated Restore Verification

I had 15 Docker containers running on my homelab server. Three of them had backups configured. I found out about the other twelve when I was troubleshooting a database migration and realized my Vaultwarden passwords had never been backed up. Not once.

That was a Tuesday.

By the following weekend I had a working backup system that reads Docker labels, stops containers when needed, dumps databases automatically, encrypts everything with restic, and -- the part no existing tool does -- verifies that backups can actually be restored by spinning up temporary containers and testing them.

This is the full build log. Every command, every wrong turn, every decision.

The State of Homelab Backup Tools

Before building anything, I spent two hours surveying what already exists. The landscape is bleak.

Duplicati works but its scheduler is unreliable, the UI is sluggish, and it doesn't understand Docker. You point it at a directory and hope the container's data is consistent at backup time. If your database is mid-write, you get a corrupt backup that you won't discover until you try to restore.

docker-volume-backup (offen) is the closest thing to what I wanted. It reads Docker labels and can stop containers. But it has no deduplication, no encryption (you have to layer GPG yourself), and no concept of profiles or restore verification. Issue #329 on their repo has 11+ upvotes asking for per-service retention policies. Still open.

BorgBackup is technically excellent but doesn't speak S3 or B2 natively, which kills the 3-2-1 backup rule for most homelabs. Restic covers the same dedup/encryption ground and has native cloud backend support.

Six gaps. No tool fills any of them:

No restore verification
No multi-host backup aggregation with health scoring
Nobody combines Docker labels with restic
No "are my backups healthy?" answer
No auto-generated disaster recovery runbooks
No per-service label-driven configuration (the offen #329 problem)

Choosing the Components

Restic was a quick decision. Content-defined chunking gives real deduplication (not just file-level), AES-256 encryption is always-on, and it speaks S3/B2/SFTP natively. The community is active and Backrest wraps it for a snapshot browser UI.

For notifications I went with shoutrrr instead of Apprise. Apprise has broader service coverage, but shoutrrr is Go-native (tiny image, fast startup) and covers every notification service a homelab runner actually uses: Discord, Slack, ntfy, Gotify, Telegram, email. It runs on-demand via docker run --rm rather than as an idle sidecar -- no point keeping a container alive just for occasional notifications.

The architecture decision that matters most: the restic container is separate from Backrest. Backrest wraps restic for its UI, but it doesn't understand Docker. My scripts need to docker exec into something with access to the Docker socket to read labels and stop/start containers. Backrest can't do that, so restic runs as its own long-lived container with the socket mounted read-only.

The Docker Compose Stack

Two services. Memory-limited, health-checked.

Restic gets 512MB, Backrest 256MB. Restic uses tail -f /dev/null as its entrypoint -- it's an exec target, not a standalone process. Scripts run inside it via docker exec. Shoutrrr runs on-demand via docker run --rm when notifications fire, which saves ~64MB of idle RAM compared to keeping it as a sidecar.

The compose file took 20 minutes. The .env.example took longer -- documenting every variable with inline comments for eight config sections. (Storage backend options alone needed four commented blocks for S3, B2, SFTP, and secondary/dual storage.)

The restic container passed its health check immediately. Backrest needed about 15 seconds on first boot to create its internal database.

Docker Labels: The backup.nxsi.* Namespace

This is the core differentiator. Instead of a central YAML config listing which containers to back up, the configuration lives on the containers themselves.

labels:
  backup.nxsi.enable: "true"
  backup.nxsi.profile: "database"
  backup.nxsi.stop: "true"
  backup.nxsi.priority: "10"
  backup.nxsi.pre-hook: "pg_dump -Fc -U postgres mydb -f /backup/dump.sql"

I went with backup.nxsi.* instead of just backup.* to avoid collisions with other tools. If someone is migrating from docker-volume-backup, their docker-volume-backup.stop-during-backup labels coexist with ours.

The full label schema supports volumes, paths, stop behavior, pre/post hooks, per-service scheduling, per-service retention, profile shortcuts, and verification hints. That's a lot of labels, but in practice most services only need two: enable and profile.

backup.sh — The Orchestrator

The main script is about 470 lines. Not enormous, but dense.

The flow:

Source .env, run preflight checks
Query Docker API for all containers with backup.nxsi.enable=true
Sort by priority label (ascending -- lower number runs first)
Per container: read labels, resolve profile, run pre-hook via docker exec, stop if flagged, stage volume data into the restic container, run restic backup from the staging directory, restart, run post-hook, apply retention
Copy to secondary repo if configured
Send notification summary

Step 4 has an architectural detail that cost me half an afternoon. You can't just pass host volume paths to the restic container because /var/lib/docker/volumes/xyz/_data doesn't exist inside it -- those are host paths. The solution: pipe the data in. For each volume, spin up a throwaway busybox container with the volume mounted read-only, tar the contents, and pipe them into the restic container's staging directory via docker exec -i tar xf -. For database dumps, docker cp the dump out of the source container and pipe it in the same way. It's container-to-container data transfer without ever touching the host filesystem.

This also means your backup scripts don't need privileged access to the Docker data directory. The Docker socket (mounted read-only) is enough.

The pre-hook system gave me the most trouble. I originally had hooks running on the host, but that requires the hook commands to exist on the host. Running inside the container via docker exec means a PostgreSQL hook uses the pg_dump binary inside the postgres container -- which is always the right version. No version mismatch, no missing binaries.

Hooks get a 60-second timeout. I briefly considered making this configurable per-service, but 60 seconds is generous for any homelab database dump and adding another label felt like diminishing returns.

Profiles: Five Patterns That Cover Almost Everything

Rather than making every user configure every label, profiles provide preset defaults.

The database profile auto-detects database type from the container image name. It doesn't just match exact names -- pgvector/pgvector:pg17 and timescaledb/timescaledb:latest both get recognized as PostgreSQL because the detection regex matches image name patterns (postgres|pgvector|timescale|postgis all map to the pg_dump hook, mysql|percona maps to mysqldump). I initially only checked for literal "postgres" in the image name, which missed pgvector entirely. Since pgvector and TimescaleDB are some of the most common postgres variants in homelabs, catching those matters.

It worked on the first try for postgres:17 and mariadb:11 once the detection was right. MongoDB's mongodump --archive format needed the --archive flag specifically (not --out), which I caught during testing.

The critical profile (Vaultwarden, Gitea, auth services) sets stop: true and maximum retention. Stopping the container guarantees consistency -- no writes mid-backup. The trade-off is a few seconds of downtime. For Vaultwarden that's fine; for a public-facing service it might not be.

config-only is for Pi-hole, Traefik, and similar services where the config is small and valuable but you don't need weekly snapshots of logs. 90-day flat retention, skip logs and cache.

large-media skips actual media files (mkv, mp4, etc.) and only backs up Plex/Jellyfin metadata. Nobody wants to deduplicate 4TB of movies through restic.

Automated Restore Verification

This is the thing nobody else does. Every backup tool can tell you it ran successfully. None of them can tell you the data is actually restorable.

verify.sh works like this:

For each service, get the latest snapshot
Restore to a temp directory
Based on the verification type:
- Database: Spin up a temporary PostgreSQL/MySQL container, import the dump, run SELECT 1, count tables, count rows in the largest table
- App: Spin up a temporary container with the original image, wait 30 seconds, hit the health check URL
- Files: Count restored files, check for zero-byte files, run restic check on a 10% data sample
Report PASS/FAIL per service
Clean up all temp containers (via trap -- cleanup runs even if the script crashes)

The database verification caught a real issue during testing. I had set up a pre-hook that ran pg_dump without the --format=custom flag, which produced a plaintext SQL dump. The dump worked, but when I tried to import a 200MB plaintext dump into the temp container, it took 90 seconds and timed out. Switching to custom format (binary) dropped import time to 3 seconds.

I should have known better.

The app verification is simpler but still useful. For a service like Gitea, it spins up a temp container with the same image, mounts the restored volume, and checks if the health endpoint responds. If it does, the data plus the image produce a working service. That's a meaningful signal.

Health Scoring

After running backups for a few days, I wanted a single answer: "are my backups healthy?"

backup-health.sh gives each service a 0-100 score:

Recency (40 points): backed up within schedule
Verification (30 points): last verify passed
Consistency (20 points): no missed backups in 7 days
Storage (10 points): restic check passed recently

80+ is healthy. Below 50, take action. The scoring weights are opinionated -- I put verification at 30% because an unverified backup is worth less than a verified one regardless of how recent it is.

A nice side effect: the Prometheus metrics export feeds these scores into my existing monitoring stack. I can set a Grafana alert on backup_health_score < 50 and know immediately if something goes wrong. I wrote up the full Grafana dashboard setup separately -- eight panels covering health scores, backup ages, sizes, verification status, and repo growth.

DR Runbook Generation

The last script I built was dr-generate.sh. It reads each container's image, volumes, network, ports, and environment variables (redacted), combines that with the latest snapshot info, and generates a step-by-step recovery markdown file.

The generated runbook is specific enough to follow from scratch: pull this image, create these volumes, restore this snapshot, copy data here, import this dump, start the container, verify it works. Environment variable names are listed but values are redacted -- you need your compose file for the actual values.

I tested one by following the generated runbook for my PostgreSQL container on a fresh Docker install. It worked end-to-end. The generated commands were copy-pasteable.

The Regex That Lied

One bug that took longer to find than it should have: after each backup, the script extracts the snapshot ID from restic's output. The regex was grep -oP 'snapshot \K[a-f0-9]+'. Seemed fine. But the first line of restic's output is "no parent snapshot found, will read all files." The f in "found" matches [a-f0-9]+. With head -1 grabbing the first match, every snapshot ID came back as f.

The fix: grep -oP 'snapshot \K[a-f0-9]+ (?=saved)' -- require the word "saved" after the hex string. Only the actual "snapshot abc12345 saved" line matches.

This is the kind of bug that works perfectly in isolation (the regex matches the right line when you test it alone) but fails in production because you forgot the other line that also contains the word "snapshot."

What I'd Do Differently

The profiles are JSON parsed with Python3 one-liners inside bash. It works, but it's fragile. If I were starting over, I'd write the orchestrator in Python and call restic as a subprocess. Bash is fine for the individual utility scripts, but the main orchestrator has enough JSON parsing, Docker API calls, and control flow that Python would be cleaner.

The verification system doesn't test application-level data integrity. It checks "can I start a container with this data" but not "does this Gitea instance have all my repos." That's a v2 problem -- it would require per-application verification plugins, and the current file/DB/health-check approach catches the vast majority of backup failures.

The setup wizard is purely terminal-based. A small web UI would make initial configuration more approachable, but it's also a maintenance burden I don't want for a v1 product.

The Numbers

27 files in the product kit
11 automation scripts, all passing bash -n syntax validation
5 backup profiles covering databases, critical services, configs, media, and general volumes
58KB ZIP after compression
15 minutes from unzip to first verified backup (on my test run)
Health scoring, Prometheus metrics, and DR runbooks working on day one

The stack is shipping as a free download on nxsi.io. Just enter your email, MIT licensed, no subscriptions.

The full product includes Docker Compose, .env template, setup wizard, 11 scripts, 5 profiles, and 4 documentation guides. Available at nxsi.io.

I Built a Docker-Aware Backup Stack with Automated Restore Verification

That was a Tuesday.

This is the full build log. Every command, every wrong turn, every decision.

The State of Homelab Backup Tools

Before building anything, I spent two hours surveying what already exists. The landscape is bleak.

Six gaps. No tool fills any of them:

No restore verification
No multi-host backup aggregation with health scoring
Nobody combines Docker labels with restic
No "are my backups healthy?" answer
No auto-generated disaster recovery runbooks
No per-service label-driven configuration (the offen #329 problem)

Choosing the Components

The Docker Compose Stack

Two services. Memory-limited, health-checked.

The restic container passed its health check immediately. Backrest needed about 15 seconds on first boot to create its internal database.

Docker Labels: The backup.nxsi.* Namespace

This is the core differentiator. Instead of a central YAML config listing which containers to back up, the configuration lives on the containers themselves.

labels:
  backup.nxsi.enable: "true"
  backup.nxsi.profile: "database"
  backup.nxsi.stop: "true"
  backup.nxsi.priority: "10"
  backup.nxsi.pre-hook: "pg_dump -Fc -U postgres mydb -f /backup/dump.sql"

backup.sh — The Orchestrator

The main script is about 470 lines. Not enormous, but dense.

The flow:

Source .env, run preflight checks
Query Docker API for all containers with backup.nxsi.enable=true
Sort by priority label (ascending -- lower number runs first)
Per container: read labels, resolve profile, run pre-hook via docker exec, stop if flagged, stage volume data into the restic container, run restic backup from the staging directory, restart, run post-hook, apply retention
Copy to secondary repo if configured
Send notification summary

This also means your backup scripts don't need privileged access to the Docker data directory. The Docker socket (mounted read-only) is enough.

Profiles: Five Patterns That Cover Almost Everything

Rather than making every user configure every label, profiles provide preset defaults.

config-only is for Pi-hole, Traefik, and similar services where the config is small and valuable but you don't need weekly snapshots of logs. 90-day flat retention, skip logs and cache.

large-media skips actual media files (mkv, mp4, etc.) and only backs up Plex/Jellyfin metadata. Nobody wants to deduplicate 4TB of movies through restic.

Automated Restore Verification

This is the thing nobody else does. Every backup tool can tell you it ran successfully. None of them can tell you the data is actually restorable.

verify.sh works like this:

For each service, get the latest snapshot
Restore to a temp directory
Based on the verification type:
- Database: Spin up a temporary PostgreSQL/MySQL container, import the dump, run SELECT 1, count tables, count rows in the largest table
- App: Spin up a temporary container with the original image, wait 30 seconds, hit the health check URL
- Files: Count restored files, check for zero-byte files, run restic check on a 10% data sample
Report PASS/FAIL per service
Clean up all temp containers (via trap -- cleanup runs even if the script crashes)

I should have known better.

Health Scoring

After running backups for a few days, I wanted a single answer: "are my backups healthy?"

backup-health.sh gives each service a 0-100 score:

Recency (40 points): backed up within schedule
Verification (30 points): last verify passed
Consistency (20 points): no missed backups in 7 days
Storage (10 points): restic check passed recently

80+ is healthy. Below 50, take action. The scoring weights are opinionated -- I put verification at 30% because an unverified backup is worth less than a verified one regardless of how recent it is.

DR Runbook Generation

I tested one by following the generated runbook for my PostgreSQL container on a fresh Docker install. It worked end-to-end. The generated commands were copy-pasteable.

The Regex That Lied

The fix: grep -oP 'snapshot \K[a-f0-9]+ (?=saved)' -- require the word "saved" after the hex string. Only the actual "snapshot abc12345 saved" line matches.

What I'd Do Differently

The setup wizard is purely terminal-based. A small web UI would make initial configuration more approachable, but it's also a maintenance burden I don't want for a v1 product.

The Numbers

27 files in the product kit
11 automation scripts, all passing bash -n syntax validation
5 backup profiles covering databases, critical services, configs, media, and general volumes
58KB ZIP after compression
15 minutes from unzip to first verified backup (on my test run)
Health scoring, Prometheus metrics, and DR runbooks working on day one

The stack is shipping as a free download on nxsi.io. Just enter your email, MIT licensed, no subscriptions.

The full product includes Docker Compose, .env template, setup wizard, 11 scripts, 5 profiles, and 4 documentation guides. Available at nxsi.io.

IBuiltaDocker-AwareBackupStackwithAutomatedRestoreVerification—Here'stheFullBuildLog

I Built a Docker-Aware Backup Stack with Automated Restore Verification

The State of Homelab Backup Tools

Choosing the Components

The Docker Compose Stack

Docker Labels: The backup.nxsi.* Namespace

backup.sh — The Orchestrator

Profiles: Five Patterns That Cover Almost Everything

Automated Restore Verification

Health Scoring

DR Runbook Generation

The Regex That Lied

What I'd Do Differently

The Numbers

Get weekly AI architecture insights

IBuiltaDocker-AwareBackupStackwithAutomatedRestoreVerification—Here'stheFullBuildLog

I Built a Docker-Aware Backup Stack with Automated Restore Verification

The State of Homelab Backup Tools

Choosing the Components

The Docker Compose Stack

Docker Labels: The backup.nxsi.* Namespace

backup.sh — The Orchestrator

Profiles: Five Patterns That Cover Almost Everything

Automated Restore Verification

Health Scoring

DR Runbook Generation

The Regex That Lied

What I'd Do Differently

The Numbers

Get weekly AI architecture insights

Homelab Backup Automation Stack — Docker Labels + Restic + Restore Verification

Read Next

Deploy Automated Docker Backups with Restic: Encryption, Dedup, and One-Command Restore

Every Backup Tool Promises Reliability. Here's How to Actually Verify Yours Can Restore.

The 5 Docker Backup Profiles Every Homelab Needs (and the Label System That Configures Them)