Prometheus High Memory Usage — Fix
The Problem
Prometheus uses an increasing amount of RAM over time, eventually consuming several gigabytes and causing OOM kills or system instability. Common symptoms:
Updated March 2026: Verified with latest Docker images and configurations.
- Container restarting due to OOM (Out of Memory) kills
- Server swapping heavily with Prometheus as the top consumer
docker statsshowing Prometheus using 2-8+ GB RAM- Error in logs:
storage: no space left on deviceorout of memory
The Cause
Prometheus stores recent data in memory before writing it to disk. Three factors drive memory usage:
- High cardinality — too many unique time series (label combinations)
- Long retention — keeping data for months with default settings
- Large scrape targets — endpoints returning thousands of metrics per scrape
- Head block size — the in-memory block grows proportionally with active series count
The Fix
Method 1: Reduce Retention Period
Prometheus defaults to 15 days of retention. For homelabs, this is often more than needed:
services:
prometheus:
image: prom/prometheus:v3.10.0
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=7d'
- '--storage.tsdb.retention.size=5GB'
restart: unless-stopped
| Flag | Effect |
|---|---|
--storage.tsdb.retention.time=7d | Delete data older than 7 days |
--storage.tsdb.retention.size=5GB | Delete oldest data when storage exceeds 5 GB |
Both can be set simultaneously — whichever triggers first wins. Restart Prometheus after changing:
docker compose restart prometheus
Method 2: Reduce Cardinality
High cardinality is the most common cause. Check your cardinality:
# In Prometheus UI — count total active time series
prometheus_tsdb_head_series
If this number is over 100,000, you likely have a cardinality problem.
Find the culprits:
# Top 10 metrics by cardinality
topk(10, count by (__name__)({__name__=~".+"}))
Common high-cardinality offenders:
| Metric | Typical Cardinality | Fix |
|---|---|---|
container_* (cAdvisor) | 500+ per container | Drop unused metrics |
node_cpu_seconds_total | Per-core × per-mode | Normal, but limit cores scraped |
| Custom app metrics with high-cardinality labels | Varies | Remove instance_id, request_id labels |
Drop unused metrics with metric_relabel_configs:
# prometheus.yml
scrape_configs:
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
metric_relabel_configs:
# Drop metrics you don't need
- source_labels: [__name__]
regex: 'container_tasks_state|container_memory_failures_total|container_blkio.*'
action: drop
Method 3: Limit Scrape Interval
More frequent scraping = more data in memory:
# prometheus.yml
global:
scrape_interval: 30s # Default is 15s — double it to halve data rate
evaluation_interval: 30s
scrape_configs:
- job_name: 'node'
scrape_interval: 60s # Less critical targets can scrape less often
static_configs:
- targets: ['node-exporter:9100']
For homelab monitoring, 30-60 second intervals are perfectly adequate. You don’t need 15-second granularity for tracking server health.
Method 4: Set Memory Limits
Prevent Prometheus from consuming all available RAM:
services:
prometheus:
image: prom/prometheus:v3.10.0
deploy:
resources:
limits:
memory: 2G
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=7d'
If Prometheus hits the limit, it will restart rather than consuming all system memory. This protects other services.
Method 5: Compact the WAL
If disk usage is high from a large Write-Ahead Log:
# Check WAL size
docker exec prometheus du -sh /prometheus/wal
# Force a compaction (Prometheus must be stopped)
docker compose stop prometheus
docker compose start prometheus
Prometheus compacts the WAL on startup. A clean restart often reclaims significant disk space.
Prevention
- Set
retention.timeandretention.sizeexplicitly — don’t rely on defaults - Monitor Prometheus’s own metrics (
prometheus_tsdb_head_series,process_resident_memory_bytes) - Use
metric_relabel_configsto drop unused metrics from noisy exporters - Increase scrape intervals for non-critical targets
- Set Docker memory limits to prevent system-wide impact
- Consider Thanos or VictoriaMetrics for long-term storage instead of increasing Prometheus retention
Related
Get self-hosting tips in your inbox
Get the Docker Compose configs, hardware picks, and setup shortcuts we don't put in articles. Weekly. No spam.
Comments