I Built a Docker-Aware Backup Stack with Automated Restore Verification
I had 15 Docker containers running on my homelab server. Three of them had backups configured. I found out about the other twelve when I was troubleshooting a database migration and realized my Vaultwarden passwords had never been backed up. Not once.
That was a Tuesday.
By the following weekend I had a working backup system that reads Docker labels, stops containers when needed, dumps databases automatically, encrypts everything with restic, and -- the part no existing tool does -- verifies that backups can actually be restored by spinning up temporary containers and testing them.
This is the full build log. Every command, every wrong turn, every decision.
The State of Homelab Backup Tools
Before building anything, I spent two hours surveying what already exists. The landscape is bleak.
Duplicati works but its scheduler is unreliable, the UI is sluggish, and it doesn't understand Docker. You point it at a directory and hope the container's data is consistent at backup time. If your database is mid-write, you get a corrupt backup that you won't discover until you try to restore.
docker-volume-backup (offen) is the closest thing to what I wanted. It reads Docker labels and can stop containers. But it has no deduplication, no encryption (you have to layer GPG yourself), and no concept of profiles or restore verification. Issue #329 on their repo has 11+ upvotes asking for per-service retention policies. Still open.
BorgBackup is technically excellent but doesn't speak S3 or B2 natively, which kills the 3-2-1 backup rule for most homelabs. Restic covers the same dedup/encryption ground and has native cloud backend support.
Six gaps. No tool fills any of them:
- No restore verification
- No multi-host backup aggregation with health scoring
- Nobody combines Docker labels with restic
- No "are my backups healthy?" answer
- No auto-generated disaster recovery runbooks
- No per-service label-driven configuration (the offen #329 problem)
Choosing the Components
Restic was a quick decision. Content-defined chunking gives real deduplication (not just file-level), AES-256 encryption is always-on, and it speaks S3/B2/SFTP natively. The community is active and Backrest wraps it for a snapshot browser UI.
For notifications I went with shoutrrr instead of Apprise. Apprise has broader service coverage, but shoutrrr is Go-native (tiny image, fast startup) and covers every notification service a homelab runner actually uses: Discord, Slack, ntfy, Gotify, Telegram, email. It runs on-demand via docker run --rm rather than as an idle sidecar -- no point keeping a container alive just for occasional notifications.
The architecture decision that matters most: the restic container is separate from Backrest. Backrest wraps restic for its UI, but it doesn't understand Docker. My scripts need to docker exec into something with access to the Docker socket to read labels and stop/start containers. Backrest can't do that, so restic runs as its own long-lived container with the socket mounted read-only.
The Docker Compose Stack
Two services. Memory-limited, health-checked.
Restic gets 512MB, Backrest 256MB. Restic uses tail -f /dev/null as its entrypoint -- it's an exec target, not a standalone process. Scripts run inside it via docker exec. Shoutrrr runs on-demand via docker run --rm when notifications fire, which saves ~64MB of idle RAM compared to keeping it as a sidecar.
The compose file took 20 minutes. The .env.example took longer -- documenting every variable with inline comments for eight config sections. (Storage backend options alone needed four commented blocks for S3, B2, SFTP, and secondary/dual storage.)
The restic container passed its health check immediately. Backrest needed about 15 seconds on first boot to create its internal database.
Docker Labels: The backup.nxsi.* Namespace
This is the core differentiator. Instead of a central YAML config listing which containers to back up, the configuration lives on the containers themselves.
labels:
backup.nxsi.enable: "true"
backup.nxsi.profile: "database"
backup.nxsi.stop: "true"
backup.nxsi.priority: "10"
backup.nxsi.pre-hook: "pg_dump -Fc -U postgres mydb -f /backup/dump.sql"
I went with backup.nxsi.* instead of just backup.* to avoid collisions with other tools. If someone is migrating from docker-volume-backup, their docker-volume-backup.stop-during-backup labels coexist with ours.
The full label schema supports volumes, paths, stop behavior, pre/post hooks, per-service scheduling, per-service retention, profile shortcuts, and verification hints. That's a lot of labels, but in practice most services only need two: enable and profile.
backup.sh — The Orchestrator
The main script is about 470 lines. Not enormous, but dense.
The flow:
- Source
.env, run preflight checks - Query Docker API for all containers with
backup.nxsi.enable=true - Sort by priority label (ascending -- lower number runs first)
- Per container: read labels, resolve profile, run pre-hook via
docker exec, stop if flagged, stage volume data into the restic container, runrestic backupfrom the staging directory, restart, run post-hook, apply retention - Copy to secondary repo if configured
- Send notification summary
Step 4 has an architectural detail that cost me half an afternoon. You can't just pass host volume paths to the restic container because /var/lib/docker/volumes/xyz/_data doesn't exist inside it -- those are host paths. The solution: pipe the data in. For each volume, spin up a throwaway busybox container with the volume mounted read-only, tar the contents, and pipe them into the restic container's staging directory via docker exec -i tar xf -. For database dumps, docker cp the dump out of the source container and pipe it in the same way. It's container-to-container data transfer without ever touching the host filesystem.
This also means your backup scripts don't need privileged access to the Docker data directory. The Docker socket (mounted read-only) is enough.
The pre-hook system gave me the most trouble. I originally had hooks running on the host, but that requires the hook commands to exist on the host. Running inside the container via docker exec means a PostgreSQL hook uses the pg_dump binary inside the postgres container -- which is always the right version. No version mismatch, no missing binaries.
Hooks get a 60-second timeout. I briefly considered making this configurable per-service, but 60 seconds is generous for any homelab database dump and adding another label felt like diminishing returns.
Profiles: Five Patterns That Cover Almost Everything
Rather than making every user configure every label, profiles provide preset defaults.
The database profile auto-detects database type from the container image name. It doesn't just match exact names -- pgvector/pgvector:pg17 and timescaledb/timescaledb:latest both get recognized as PostgreSQL because the detection regex matches image name patterns (postgres|pgvector|timescale|postgis all map to the pg_dump hook, mysql|percona maps to mysqldump). I initially only checked for literal "postgres" in the image name, which missed pgvector entirely. Since pgvector and TimescaleDB are some of the most common postgres variants in homelabs, catching those matters.
It worked on the first try for postgres:17 and mariadb:11 once the detection was right. MongoDB's mongodump --archive format needed the --archive flag specifically (not --out), which I caught during testing.
The critical profile (Vaultwarden, Gitea, auth services) sets stop: true and maximum retention. Stopping the container guarantees consistency -- no writes mid-backup. The trade-off is a few seconds of downtime. For Vaultwarden that's fine; for a public-facing service it might not be.
config-only is for Pi-hole, Traefik, and similar services where the config is small and valuable but you don't need weekly snapshots of logs. 90-day flat retention, skip logs and cache.
large-media skips actual media files (mkv, mp4, etc.) and only backs up Plex/Jellyfin metadata. Nobody wants to deduplicate 4TB of movies through restic.
Automated Restore Verification
This is the thing nobody else does. Every backup tool can tell you it ran successfully. None of them can tell you the data is actually restorable.
verify.sh works like this:
- For each service, get the latest snapshot
- Restore to a temp directory
- Based on the verification type:
- Database: Spin up a temporary PostgreSQL/MySQL container, import the dump, run
SELECT 1, count tables, count rows in the largest table - App: Spin up a temporary container with the original image, wait 30 seconds, hit the health check URL
- Files: Count restored files, check for zero-byte files, run
restic checkon a 10% data sample
- Database: Spin up a temporary PostgreSQL/MySQL container, import the dump, run
- Report PASS/FAIL per service
- Clean up all temp containers (via
trap-- cleanup runs even if the script crashes)
The database verification caught a real issue during testing. I had set up a pre-hook that ran pg_dump without the --format=custom flag, which produced a plaintext SQL dump. The dump worked, but when I tried to import a 200MB plaintext dump into the temp container, it took 90 seconds and timed out. Switching to custom format (binary) dropped import time to 3 seconds.
I should have known better.
The app verification is simpler but still useful. For a service like Gitea, it spins up a temp container with the same image, mounts the restored volume, and checks if the health endpoint responds. If it does, the data plus the image produce a working service. That's a meaningful signal.
Health Scoring
After running backups for a few days, I wanted a single answer: "are my backups healthy?"
backup-health.sh gives each service a 0-100 score:
- Recency (40 points): backed up within schedule
- Verification (30 points): last verify passed
- Consistency (20 points): no missed backups in 7 days
- Storage (10 points):
restic checkpassed recently
80+ is healthy. Below 50, take action. The scoring weights are opinionated -- I put verification at 30% because an unverified backup is worth less than a verified one regardless of how recent it is.
A nice side effect: the Prometheus metrics export feeds these scores into my existing monitoring stack. I can set a Grafana alert on backup_health_score < 50 and know immediately if something goes wrong. I wrote up the full Grafana dashboard setup separately -- eight panels covering health scores, backup ages, sizes, verification status, and repo growth.
DR Runbook Generation
The last script I built was dr-generate.sh. It reads each container's image, volumes, network, ports, and environment variables (redacted), combines that with the latest snapshot info, and generates a step-by-step recovery markdown file.
The generated runbook is specific enough to follow from scratch: pull this image, create these volumes, restore this snapshot, copy data here, import this dump, start the container, verify it works. Environment variable names are listed but values are redacted -- you need your compose file for the actual values.
I tested one by following the generated runbook for my PostgreSQL container on a fresh Docker install. It worked end-to-end. The generated commands were copy-pasteable.
The Regex That Lied
One bug that took longer to find than it should have: after each backup, the script extracts the snapshot ID from restic's output. The regex was grep -oP 'snapshot \K[a-f0-9]+'. Seemed fine. But the first line of restic's output is "no parent snapshot found, will read all files." The f in "found" matches [a-f0-9]+. With head -1 grabbing the first match, every snapshot ID came back as f.
The fix: grep -oP 'snapshot \K[a-f0-9]+ (?=saved)' -- require the word "saved" after the hex string. Only the actual "snapshot abc12345 saved" line matches.
This is the kind of bug that works perfectly in isolation (the regex matches the right line when you test it alone) but fails in production because you forgot the other line that also contains the word "snapshot."
What I'd Do Differently
The profiles are JSON parsed with Python3 one-liners inside bash. It works, but it's fragile. If I were starting over, I'd write the orchestrator in Python and call restic as a subprocess. Bash is fine for the individual utility scripts, but the main orchestrator has enough JSON parsing, Docker API calls, and control flow that Python would be cleaner.
The verification system doesn't test application-level data integrity. It checks "can I start a container with this data" but not "does this Gitea instance have all my repos." That's a v2 problem -- it would require per-application verification plugins, and the current file/DB/health-check approach catches the vast majority of backup failures.
The setup wizard is purely terminal-based. A small web UI would make initial configuration more approachable, but it's also a maintenance burden I don't want for a v1 product.
The Numbers
- 27 files in the product kit
- 11 automation scripts, all passing
bash -nsyntax validation - 5 backup profiles covering databases, critical services, configs, media, and general volumes
- 58KB ZIP after compression
- 15 minutes from unzip to first verified backup (on my test run)
- Health scoring, Prometheus metrics, and DR runbooks working on day one
The stack is shipping as a free download on nxsi.io. Just enter your email, MIT licensed, no subscriptions.
The full product includes Docker Compose, .env template, setup wizard, 11 scripts, 5 profiles, and 4 documentation guides. Available at nxsi.io.