Every Backup Tool Promises Reliability. Here's How to Actually Verify Yours Can Restore.
There's a saying in operations: "You don't have backups. You have restores. Or you don't."
Most homelab backup tools give you a green checkmark and a timestamp. Backup completed. 12.4 MB added. Everything is fine. But that checkmark means exactly one thing: restic (or Borg, or Duplicati, or rsync) finished writing data. It does not mean your PostgreSQL dump can be imported. It does not mean your Vaultwarden database isn't corrupt. It does not mean anything is actually restorable.
Enterprise tools like Veeam and AWS Backup have automated restore testing. They spin up temporary VMs, import backups, run health checks, and report results. This capability costs tens of thousands of dollars in enterprise licensing.
Homelab tools have nothing.
The Verification Gap
I surveyed every backup tool commonly used in homelab setups. Duplicati, BorgBackup, restic (by itself), docker-volume-backup, Kopia, Urbackup. Not a single one has automated restore verification.
Some have integrity checks. Restic's restic check verifies that the repository structure is consistent and data packs are readable. That's valuable -- it catches bit rot and corruption. But it doesn't answer the question: "If I restore this PostgreSQL snapshot right now, will the dump import?"
The difference matters. A backup can pass integrity checks and still fail restoration because:
- The database dump was taken mid-transaction and is internally inconsistent
- A pre-hook script silently failed, so the dump is from three days ago (stale)
- File permissions in the backup don't match what the service expects
- The application has migrated its data format since the backup was taken
These aren't hypotheticals. I've hit every one of them.
What Automated Verification Looks Like
The approach: after each backup, periodically restore the snapshot into temporary containers and test whether the data is functional.
Three verification strategies, matched to service type:
Database Verification
This is the highest-value test. Databases are the most common backup target and the most common restore failure.
- Restore the dump file from the latest snapshot
- Start a temporary database container (same image as production)
- Import the dump
- Run a basic query:
SELECT 1 - Count tables
- Count rows in the largest table
- Report and destroy the temp container
If the import succeeds and queries return data, the backup is functional. The entire process takes 30-60 seconds per database.
# Simplified example for PostgreSQL
TEMP_NAME="verify-db-$(date +%s)"
docker run -d --name "$TEMP_NAME" \
-e POSTGRES_PASSWORD=verify_temp \
postgres:17
# Wait for ready
sleep 10
# Import dump
docker cp dump.sql "$TEMP_NAME:/tmp/dump.sql"
docker exec "$TEMP_NAME" psql -U postgres -f /tmp/dump.sql
# Verify
docker exec "$TEMP_NAME" psql -U postgres -c "SELECT count(*) FROM information_schema.tables WHERE table_schema='public'"
# Cleanup
docker rm -f "$TEMP_NAME"
The script has a trap cleanup EXIT that tracks all temp container names and force-removes them on exit -- even if the script crashes or gets killed. No orphaned verification containers.
Application Verification
For non-database services (Vaultwarden, Gitea, nginx), the test is different: can the application start with this data?
- Restore the volume from the latest snapshot
- Start a temporary container with the same image
- Wait 30 seconds for startup
- Hit the health check endpoint (if configured)
- Fall back to checking if PID 1 is alive
- Report and destroy
This is a coarser test than database verification. It doesn't check whether all your Gitea repos are present or all your Vaultwarden entries exist. What it does check: the data plus the image produce a running service. That catches the majority of backup failures -- corrupt volumes, missing files, incompatible data formats.
File Verification
For services without a clear health check (config-only, static files), the verification is simpler:
- Restore the snapshot
- Count files, compare to snapshot metadata
- Check for zero-byte files (a common corruption signal)
- Run
restic check --read-data-subset=1/10to verify 10% of data packs
Less conclusive than database or app verification, but still better than nothing. A backup with zero files or all-empty files is clearly broken.
The Scheduling Problem
Running verification after every backup is wasteful. A daily backup takes 3-8 minutes. Full verification of 15 services takes 10-15 minutes. Doubling backup time every day for redundant verification isn't practical.
The schedule that works: weekly verification on Sunday at 4 AM. That's enough to catch problems within 7 days while keeping compute costs low.
For critical services (passwords, auth, primary database), you could run verification more frequently. The --service flag lets you verify individual services outside the full weekly run.
Health Scoring
Verification results feed into a broader health score. Without verification, you're scoring backups on whether they ran. With verification, you're scoring them on whether they can restore.
The scoring system I use:
| Factor | Weight | Reasoning |
|---|---|---|
| Recency | 40% | Recent backup is table stakes |
| Verification | 30% | Verified backup is worth far more than unverified |
| Consistency | 20% | Missing scheduled backups indicates systemic issues |
| Storage integrity | 10% | Repository-level corruption checks |
Verification at 30% is deliberate. An unverified backup that ran yesterday is less trustworthy than a verified backup from three days ago. (This is a strong opinion. Not everyone will agree, and that's fine.)
A service that has never been verified starts at 10 out of 30 for the verification component. Not zero, because an unverified backup still has some value. But enough of a penalty to surface "Run verify.sh" as a recommendation.
Implementation Lessons
Temp container cleanup is critical. If verification crashes mid-run and leaves orphaned containers, you've got phantom database containers consuming memory. The trap cleanup EXIT pattern handles this, but you also need to handle the case where a previous verification was killed (by OOM, by user) and left containers behind. A prefix convention (verify-db-*, verify-app-*) makes cleanup straightforward.
Database dump format matters. Plaintext SQL dumps work for verification but are slow to import. PostgreSQL's custom format (-Fc) imports 10-30x faster because it's binary and parallelizable. I learned this the hard way when a 200MB plaintext dump timed out during verification while the same data in custom format imported in 3 seconds.
Health checks need fallbacks. Not every container exposes an HTTP health endpoint. If backup.nxsi.verify-url isn't set, the script falls back to checking if PID 1 is alive inside the container. It's a weaker signal but still catches containers that crash on startup.
Don't verify against production. All verification happens in isolated temp containers. Never import a backup dump into your running production database "to see if it works." That's how you lose production data. Temp containers with random names, destroyed on completion.
What This Doesn't Catch
Automated verification isn't a complete restore test. Things it misses:
- Application-level data integrity. The database imports and queries work, but are all 50,000 records present? You'd need application-specific assertions for that.
- Cross-service dependencies. If Service A depends on Service B's data, verifying them independently doesn't test the dependency.
- Restore to different hardware. The temp container runs on the same Docker host. Restoring to a different server might surface network config or volume path issues.
For homelab use, automated verification covers the 90% case. The remaining 10% requires periodic manual restore drills -- pick a service, follow the DR runbook, rebuild it from backup on a clean system. I do this quarterly. (I should probably do it monthly.)
Enterprise Comparison
| Feature | Veeam | AWS Backup | Our Approach |
|---|---|---|---|
| Automated restore test | Yes (SureBackup) | Yes (with Lambda) | Yes |
| Database-aware verification | Yes | Yes | Yes (PostgreSQL/pgvector/TimescaleDB, MySQL/Percona, MariaDB, MongoDB) |
| Application health check | Yes | Limited | Yes (HTTP endpoint) |
| Cost | $1,500+/yr | Per-restore pricing | $0 (self-hosted) |
| Homelab-friendly | No | No | Yes |
The enterprise tools are more polished and support more databases. But they're also priced for enterprise. For a homelab running 5-20 services, temporary Docker containers and health checks provide equivalent confidence at zero ongoing cost.
The Takeaway
A backup system without restore verification is a hope-based system. You hope the dump is importable. You hope the volume isn't corrupt. You hope the application can start with this data.
Hope is not a strategy. Temporary containers are cheap. Verification takes seconds. Running verify.sh --latest once a week turns "I think my backups work" into "I know they do."
The Homelab Backup Automation Stack includes verify.sh with database, application, and file verification -- plus health scoring, Prometheus metrics, and DR runbook generation. Available at nxsi.io.