Docker Compose Patterns for Homelab Stacks (9 Tips)

Most Docker Compose files I see in homelab repos are fragile. They work on the author's machine, break on yours, and give you zero information about why. The patterns below come from a 9-service monitoring stack I built and tested on a box with 8 GB of RAM. Every snippet is pulled from that real compose file — nothing hypothetical.

Environment Variables With Defaults

The single most impactful pattern for making a compose file portable: ${VAR:-default} syntax. Users who care about customization edit .env. Everyone else gets working defaults without touching YAML.

ports:
  - "${GRAFANA_PORT:-3000}:3000"
environment:
  - GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER:-admin}
  - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin}
  - GF_AUTH_ANONYMOUS_ENABLED=${GRAFANA_ANONYMOUS_ENABLED:-false}

command:
  - "--storage.tsdb.retention.time=${PROMETHEUS_RETENTION:-15d}"
  - "--storage.tsdb.retention.size=${PROMETHEUS_RETENTION_SIZE:-10GB}"

The monitoring stack has 9 services with roughly 20 configurable values. Not one requires editing docker-compose.yml. Copy .env.example to .env, change what matters, run docker compose up -d. That .env.example becomes the documentation — inline comments explain every variable:

# How long to keep metrics (default: 15 days)
PROMETHEUS_RETENTION=15d
# Max disk space for metrics (default: 10GB)
PROMETHEUS_RETENTION_SIZE=10GB

Ship an .env.example, never a populated .env. Users who get a pre-filled .env will inevitably run with your test credentials.

Health Checks on Every Service

Without health checks, Docker has no idea whether your container is actually functioning or just technically alive with a running PID. The difference matters when other services depend on it.

Grafana's health check:

healthcheck:
  test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/api/health"]
  interval: 30s
  timeout: 5s
  retries: 3

Prometheus uses its dedicated health endpoint:

healthcheck:
  test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
  interval: 30s
  timeout: 5s
  retries: 3

Uptime Kuma is the awkward one — no wget or curl in the image, just Node.js:

healthcheck:
  test: ["CMD-SHELL", "node -e \"const http = require('http'); http.get('http://localhost:3001/api/status-page/heartbeat', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) }).on('error', () => process.exit(1))\""]
  interval: 30s
  timeout: 5s
  retries: 3

Ugly. But it works, and that's the point — you use whatever HTTP client the image gives you. Alpine-based images have wget. Debian-based often have curl. Node images have node. Check what's available before reaching for apt-get install in a health check command (I've seen people do this; don't).

The interval: 30s with retries: 3 means Docker marks a container unhealthy after 90 seconds of consecutive failures. Tight enough to catch real problems, loose enough to survive a slow Loki ingester warmup.

Memory Limits

If you're running on a 4-8 GB homelab box, one misbehaving container can OOM-kill everything. Memory limits turn a full-system crash into one container restarting.

deploy:
  resources:
    limits:
      memory: 256M

The actual limits from the monitoring stack:

Service	Memory Limit	Why
Prometheus	512M	Largest consumer — TSDB, WAL, query execution
Grafana	256M	Dashboard rendering, plugin loading
Loki	256M	Log ingestion and TSDB
Uptime Kuma	256M	Node.js runtime + SQLite
cAdvisor	128M	Kernel metrics collection
Promtail	128M	Log tailing and shipping
Node Exporter	64M	Minimal — reads /proc and /sys
Alertmanager	64M	Lightweight alert routing
PVE Exporter	64M	Python + API polling

Total budget: ~1.4 GB for 9 services. That leaves breathing room on an 8 GB machine running other things.

I arrived at these numbers by running the stack under load and watching actual consumption. Prometheus peaked around 380 MB with 15 days of retention and 7 scrape targets. The 512M limit gives it headroom without letting a runaway query eat all your RAM.

(An aside: I initially set Promtail to 64M and it kept getting killed during first boot when it tried to ingest all existing Docker logs at once. 128M fixed it. Your move, first-boot burst ingestion.)

Named Volumes vs. Bind Mounts

Two different volume patterns serve two different purposes. The compose file uses both, deliberately.

Named volumes for persistent data — things the service writes and manages:

volumes:
  grafana-data:/var/lib/grafana
  prometheus-data:/prometheus
  loki-data:/loki
  alertmanager-data:/alertmanager
  uptime-kuma-data:/app/data
  promtail-positions:/tmp

Bind mounts for config files you author and the service reads:

volumes:
  - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
  - ./prometheus/alerts:/etc/prometheus/alerts:ro
  - ./grafana/provisioning:/etc/grafana/provisioning:ro
  - ./grafana/dashboards:/var/lib/grafana/dashboards:ro

Named volumes survive docker compose down and get managed by Docker's storage driver. Bind mounts point at files you version control. Mixing these up — putting Prometheus data in a bind mount, or configs in a named volume — creates pain that shows up weeks later when you try to back up or redeploy.

Read-Only Config Mounts

Every bind-mounted config file has :ro appended:

volumes:
  - ./loki/loki.yml:/etc/loki/loki.yml:ro
  - ./promtail/promtail.yml:/etc/promtail/promtail.yml:ro
  - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro

Two reasons. First, defense in depth — a compromised container can't modify its own config to do something you didn't intend. Second, it makes intent explicit. When I see :ro I know that file flows one direction: host into container, never back.

Node Exporter takes this further with read-only mounts of the host filesystem:

volumes:
  - /proc:/host/proc:ro
  - /sys:/host/sys:ro
  - /:/host:ro,rslave

That :ro,rslave on the root mount matters — rslave propagates mount events from the host into the container (so new mounts appear), but :ro prevents the container from writing anything back. Node Exporter needs to see your filesystems to report disk usage. It does not need write access.

Dedicated Networks

networks:
  monitoring:
    name: monitoring
    driver: bridge

Every service joins this network:

networks:
  - monitoring

The name: monitoring is deliberate. Without it, Docker prefixes the project directory name — you'd get homelab-monitoring_monitoring or whatever your folder is called. Explicit names make docker network inspect monitoring predictable.

Inside this network, services resolve each other by name. Grafana's datasource config references http://prometheus:9090 and http://loki:3100. No IPs. If Prometheus restarts and gets a different internal IP, everything still works.

A common mistake: using the default bridge network for everything. When your monitoring stack, your media stack, and your home automation stack all share a network, every container can talk to every other container. Dedicated networks provide isolation. Your Prometheus doesn't need to reach your Plex server.

Restart Policies

Every service uses the same policy:

restart: unless-stopped

Not always. The difference: unless-stopped respects manual docker stop commands. If you stop Grafana to debug something, it stays stopped. always would restart it immediately, fighting you.

Not on-failure either. on-failure only restarts on non-zero exit codes. If a service gets OOM-killed, the exit code is 137 (SIGKILL) — on-failure handles that. But some services exit cleanly with code 0 during transient issues and need to come back. unless-stopped covers both cases.

One line. Boring. Prevents you from waking up to a dead monitoring stack because Loki crashed at 2 AM and nobody restarted it.

Container Naming

container_name: monitoring-grafana
container_name: monitoring-prometheus
container_name: monitoring-loki
container_name: monitoring-promtail
container_name: monitoring-node-exporter
container_name: monitoring-cadvisor
container_name: monitoring-alertmanager
container_name: monitoring-pve-exporter
container_name: monitoring-uptime-kuma

The monitoring- prefix does two things. docker ps becomes scannable — you can instantly tell which stack a container belongs to. And it prevents name collisions. If you have a grafana container in your monitoring stack and another in a dev stack, Docker will refuse to start the second one. Prefixed names eliminate the ambiguity.

docker ps --filter name=monitoring- gives you just this stack. Fast.

Dependency Ordering With Health Conditions

Promtail ships logs to Loki. If Promtail starts before Loki is ready, log batches fail and get retried — noisy, wasteful, and alarming if you're watching the logs.

promtail:
  depends_on:
    loki:
      condition: service_healthy

This is the only dependency declaration in the stack because it's the only one that genuinely matters. Prometheus scrapes targets on an interval — if a target isn't up yet, the scrape fails silently and succeeds on the next cycle. Grafana queries Prometheus on demand — if Prometheus isn't ready, the dashboard shows "No data" and refreshes automatically. These services tolerate startup ordering gracefully.

Promtail is different. It discovers existing Docker log files on boot and immediately starts shipping thousands of lines. Sending that burst to a Loki instance that's still initializing its TSDB produces a wall of 503 errors. The condition: service_healthy gates Promtail startup until Loki's /ready endpoint returns 200.

Don't add depends_on between services that handle missing peers gracefully on their own. You'll just slow down your startup for no reason.

All Nine Patterns

${VAR:-default} everywhere — zero YAML editing for users
Health checks on every service — using whatever HTTP client the image provides
Memory limits — total stack budget under 1.5 GB for 9 services
Named volumes for data, bind mounts for config — clean separation of concerns
:ro on all config mounts — config flows host to container, never back
Dedicated named network — service DNS, stack isolation, predictable naming
unless-stopped restart policy — respects manual stops, covers all failure modes
Stack-prefixed container names — monitoring-* for instant docker ps filtering
depends_on with condition: service_healthy — only where startup order actually matters

None of these are exotic. Most are one or two lines. I have been guilty of skipping half of them in personal projects and regretting it six months later when I try to redeploy on a different box. The difference between a compose file that works on your machine and one that works on anyone's machine is usually just these small, boring decisions compounding.

The Homelab Monitoring Stack kit uses every pattern above across a production-tested 9-service Docker Compose file — Grafana, Prometheus, Loki, Alertmanager, Node Exporter, cAdvisor, Promtail, PVE Exporter, and Uptime Kuma. Pre-built dashboards, 23 alert rules, and documentation included. Free download, yours to modify.

Environment Variables With Defaults

ports:
  - "${GRAFANA_PORT:-3000}:3000"
environment:
  - GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER:-admin}
  - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin}
  - GF_AUTH_ANONYMOUS_ENABLED=${GRAFANA_ANONYMOUS_ENABLED:-false}

command:
  - "--storage.tsdb.retention.time=${PROMETHEUS_RETENTION:-15d}"
  - "--storage.tsdb.retention.size=${PROMETHEUS_RETENTION_SIZE:-10GB}"

# How long to keep metrics (default: 15 days)
PROMETHEUS_RETENTION=15d
# Max disk space for metrics (default: 10GB)
PROMETHEUS_RETENTION_SIZE=10GB

Ship an .env.example, never a populated .env. Users who get a pre-filled .env will inevitably run with your test credentials.

Health Checks on Every Service

Without health checks, Docker has no idea whether your container is actually functioning or just technically alive with a running PID. The difference matters when other services depend on it.

Grafana's health check:

healthcheck:
  test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/api/health"]
  interval: 30s
  timeout: 5s
  retries: 3

Prometheus uses its dedicated health endpoint:

healthcheck:
  test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
  interval: 30s
  timeout: 5s
  retries: 3

Uptime Kuma is the awkward one — no wget or curl in the image, just Node.js:

healthcheck:
  test: ["CMD-SHELL", "node -e \"const http = require('http'); http.get('http://localhost:3001/api/status-page/heartbeat', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) }).on('error', () => process.exit(1))\""]
  interval: 30s
  timeout: 5s
  retries: 3

Memory Limits

If you're running on a 4-8 GB homelab box, one misbehaving container can OOM-kill everything. Memory limits turn a full-system crash into one container restarting.

deploy:
  resources:
    limits:
      memory: 256M

The actual limits from the monitoring stack:

Service	Memory Limit	Why
Prometheus	512M	Largest consumer — TSDB, WAL, query execution
Grafana	256M	Dashboard rendering, plugin loading
Loki	256M	Log ingestion and TSDB
Uptime Kuma	256M	Node.js runtime + SQLite
cAdvisor	128M	Kernel metrics collection
Promtail	128M	Log tailing and shipping
Node Exporter	64M	Minimal — reads /proc and /sys
Alertmanager	64M	Lightweight alert routing
PVE Exporter	64M	Python + API polling

Total budget: ~1.4 GB for 9 services. That leaves breathing room on an 8 GB machine running other things.

Named Volumes vs. Bind Mounts

Two different volume patterns serve two different purposes. The compose file uses both, deliberately.

Named volumes for persistent data — things the service writes and manages:

volumes:
  grafana-data:/var/lib/grafana
  prometheus-data:/prometheus
  loki-data:/loki
  alertmanager-data:/alertmanager
  uptime-kuma-data:/app/data
  promtail-positions:/tmp

Bind mounts for config files you author and the service reads:

volumes:
  - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
  - ./prometheus/alerts:/etc/prometheus/alerts:ro
  - ./grafana/provisioning:/etc/grafana/provisioning:ro
  - ./grafana/dashboards:/var/lib/grafana/dashboards:ro

Read-Only Config Mounts

Every bind-mounted config file has :ro appended:

volumes:
  - ./loki/loki.yml:/etc/loki/loki.yml:ro
  - ./promtail/promtail.yml:/etc/promtail/promtail.yml:ro
  - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro

Node Exporter takes this further with read-only mounts of the host filesystem:

volumes:
  - /proc:/host/proc:ro
  - /sys:/host/sys:ro
  - /:/host:ro,rslave

Dedicated Networks

networks:
  monitoring:
    name: monitoring
    driver: bridge

Every service joins this network:

networks:
  - monitoring

Restart Policies

Every service uses the same policy:

restart: unless-stopped

Not always. The difference: unless-stopped respects manual docker stop commands. If you stop Grafana to debug something, it stays stopped. always would restart it immediately, fighting you.

One line. Boring. Prevents you from waking up to a dead monitoring stack because Loki crashed at 2 AM and nobody restarted it.

Container Naming

container_name: monitoring-grafana
container_name: monitoring-prometheus
container_name: monitoring-loki
container_name: monitoring-promtail
container_name: monitoring-node-exporter
container_name: monitoring-cadvisor
container_name: monitoring-alertmanager
container_name: monitoring-pve-exporter
container_name: monitoring-uptime-kuma

docker ps --filter name=monitoring- gives you just this stack. Fast.

Dependency Ordering With Health Conditions

Promtail ships logs to Loki. If Promtail starts before Loki is ready, log batches fail and get retried — noisy, wasteful, and alarming if you're watching the logs.

promtail:
  depends_on:
    loki:
      condition: service_healthy

Don't add depends_on between services that handle missing peers gracefully on their own. You'll just slow down your startup for no reason.

All Nine Patterns

${VAR:-default} everywhere — zero YAML editing for users
Health checks on every service — using whatever HTTP client the image provides
Memory limits — total stack budget under 1.5 GB for 9 services
Named volumes for data, bind mounts for config — clean separation of concerns
:ro on all config mounts — config flows host to container, never back
Dedicated named network — service DNS, stack isolation, predictable naming
unless-stopped restart policy — respects manual stops, covers all failure modes
Stack-prefixed container names — monitoring-* for instant docker ps filtering
depends_on with condition: service_healthy — only where startup order actually matters

9DockerComposePatternsIUseinEveryHomelabStack

Environment Variables With Defaults

Health Checks on Every Service

Memory Limits

Named Volumes vs. Bind Mounts

Read-Only Config Mounts

Dedicated Networks

Restart Policies

Container Naming

Dependency Ordering With Health Conditions

All Nine Patterns

Homelab Monitoring Stack — Complete Docker Compose + Grafana Dashboards

Get weekly AI architecture insights

9DockerComposePatternsIUseinEveryHomelabStack

Environment Variables With Defaults

Health Checks on Every Service

Memory Limits

Named Volumes vs. Bind Mounts

Read-Only Config Mounts

Dedicated Networks

Restart Policies

Container Naming

Dependency Ordering With Health Conditions

All Nine Patterns

Homelab Monitoring Stack — Complete Docker Compose + Grafana Dashboards

Get weekly AI architecture insights

Read Next

I Built a 9-Service Homelab Monitoring Stack and Shipped It as a Product — Here's the Full Build Log

Deploy a Complete Homelab Monitoring Stack with Docker Compose: Grafana, Prometheus, Loki, and 23 Alert Rules

Monitoring Proxmox with Grafana and Prometheus: A Practical Setup