Prometheus pulls (scrapes) metrics from your services on a schedule, stores them as time series data, and evaluates alert rules against them. It doesn't push. Your services expose a /metrics endpoint, Prometheus hits it every N seconds, done.
Quick Setup (Docker Compose)
prometheus:
image: prom/prometheus:v3.2.1
container_name: prometheus
ports:
- "9090:9090"
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=15d"
volumes:
- prometheus-data:/prometheus
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/alerts:/etc/prometheus/alerts:ro
restart: unless-stopped
volumes:
prometheus-data:
Create the config file:
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
mkdir -p prometheus/alerts
docker compose up -d prometheus
Open http://your-server:9090 to verify. Type up in the expression box and hit Execute -- you should see up{job="prometheus"} 1.
Adding Scrape Targets
Every service you want to monitor needs a scrape job. Common ones:
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node-exporter"
static_configs:
- targets: ["node-exporter:9100"]
- job_name: "cadvisor"
static_configs:
- targets: ["cadvisor:8080"]
Node Exporter gives you host-level metrics (CPU, memory, disk, network). cAdvisor gives you per-container metrics. Between the two, you can monitor basically everything on a Docker host.
After editing prometheus.yml, reload the config:
docker compose restart prometheus
Or send a SIGHUP if you don't want downtime:
docker exec prometheus kill -HUP 1
Alert Rules
Put alert rule files in prometheus/alerts/ and reference them in the config:
# Add to prometheus.yml
rule_files:
- "/etc/prometheus/alerts/*.yml"
Example alert rule file:
# prometheus/alerts/host-alerts.yml
groups:
- name: host
rules:
- alert: HighCpuUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "CPU usage above 85% for 5 minutes"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
for: 10m
labels:
severity: critical
annotations:
summary: "Root filesystem has less than 15% free space"
Alerts fire in Prometheus but notifications go through Alertmanager -- a separate service that handles deduplication, grouping, silencing, and routing to Discord/Slack/email. That's covered in the Homelab Monitoring Stack Tutorial.
Configuration Notes
- Retention: Default is 15 days. For homelab use, that's plenty. Bump it with
--storage.tsdb.retention.time=30dif you want a month. Each day of retention costs roughly 1-2 MB per scrape target. - Scrape interval: 15s is the standard default. Going below 10s generates a lot of data without much benefit for infrastructure monitoring. Going above 60s and you'll miss short spikes.
- Relabeling: Prometheus has a powerful relabeling system for manipulating labels before storage. You probably don't need it until you do, and then you really need it. The relabel_config docs are worth bookmarking.
- Federation: If you run Prometheus on multiple machines, one instance can scrape another's
/federateendpoint. Useful for multi-host setups without a central time series database.
Troubleshooting
Target shows "DOWN" in Status > Targets -- The scrape target isn't reachable. Check that the container is running, the port is correct, and both containers share a Docker network. curl http://target:port/metrics from inside the Prometheus container to debug.
"out of order sample" errors -- Two Prometheus instances are scraping the same target, or the system clock jumped. Don't run duplicate scrapers.
Storage growing faster than expected -- High-cardinality labels (like unique request IDs or user IDs in metric labels) create massive time series counts. Use prometheus_tsdb_head_series to check your active series count. Anything above 100k on a homelab setup is suspicious.