Prometheus High Memory Usage — Fix

The Problem

Prometheus uses an increasing amount of RAM over time, eventually consuming several gigabytes and causing OOM kills or system instability. Common symptoms:

Updated March 2026: Verified with latest Docker images and configurations.

Container restarting due to OOM (Out of Memory) kills
Server swapping heavily with Prometheus as the top consumer
docker stats showing Prometheus using 2-8+ GB RAM
Error in logs: storage: no space left on device or out of memory

The Cause

Prometheus stores recent data in memory before writing it to disk. Three factors drive memory usage:

High cardinality — too many unique time series (label combinations)
Long retention — keeping data for months with default settings
Large scrape targets — endpoints returning thousands of metrics per scrape
Head block size — the in-memory block grows proportionally with active series count

The Fix

Method 1: Reduce Retention Period

Prometheus defaults to 15 days of retention. For homelabs, this is often more than needed:

services:
  prometheus:
    image: prom/prometheus:v3.10.0
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=7d'
      - '--storage.tsdb.retention.size=5GB'
    restart: unless-stopped

Flag	Effect
`--storage.tsdb.retention.time=7d`	Delete data older than 7 days
`--storage.tsdb.retention.size=5GB`	Delete oldest data when storage exceeds 5 GB

Both can be set simultaneously — whichever triggers first wins. Restart Prometheus after changing:

docker compose restart prometheus

Method 2: Reduce Cardinality

High cardinality is the most common cause. Check your cardinality:

# In Prometheus UI — count total active time series
prometheus_tsdb_head_series

If this number is over 100,000, you likely have a cardinality problem.

Find the culprits:

# Top 10 metrics by cardinality
topk(10, count by (__name__)({__name__=~".+"}))

Common high-cardinality offenders:

Metric	Typical Cardinality	Fix
`container_*` (cAdvisor)	500+ per container	Drop unused metrics
`node_cpu_seconds_total`	Per-core × per-mode	Normal, but limit cores scraped
Custom app metrics with high-cardinality labels	Varies	Remove `instance_id`, `request_id` labels

Drop unused metrics with metric_relabel_configs:

# prometheus.yml
scrape_configs:
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
    metric_relabel_configs:
      # Drop metrics you don't need
      - source_labels: [__name__]
        regex: 'container_tasks_state|container_memory_failures_total|container_blkio.*'
        action: drop

Method 3: Limit Scrape Interval

More frequent scraping = more data in memory:

# prometheus.yml
global:
  scrape_interval: 30s      # Default is 15s — double it to halve data rate
  evaluation_interval: 30s

scrape_configs:
  - job_name: 'node'
    scrape_interval: 60s     # Less critical targets can scrape less often
    static_configs:
      - targets: ['node-exporter:9100']

For homelab monitoring, 30-60 second intervals are perfectly adequate. You don’t need 15-second granularity for tracking server health.

Method 4: Set Memory Limits

Prevent Prometheus from consuming all available RAM:

services:
  prometheus:
    image: prom/prometheus:v3.10.0
    deploy:
      resources:
        limits:
          memory: 2G
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=7d'

If Prometheus hits the limit, it will restart rather than consuming all system memory. This protects other services.

Method 5: Compact the WAL

If disk usage is high from a large Write-Ahead Log:

# Check WAL size
docker exec prometheus du -sh /prometheus/wal

# Force a compaction (Prometheus must be stopped)
docker compose stop prometheus
docker compose start prometheus

Prometheus compacts the WAL on startup. A clean restart often reclaims significant disk space.

Prevention

Set retention.time and retention.size explicitly — don’t rely on defaults
Monitor Prometheus’s own metrics (prometheus_tsdb_head_series, process_resident_memory_bytes)
Use metric_relabel_configs to drop unused metrics from noisy exporters
Increase scrape intervals for non-critical targets
Set Docker memory limits to prevent system-wide impact
Consider Thanos or VictoriaMetrics for long-term storage instead of increasing Prometheus retention

The Problem

The Cause

The Fix

Method 1: Reduce Retention Period

Method 2: Reduce Cardinality

Method 3: Limit Scrape Interval

Method 4: Set Memory Limits

Method 5: Compact the WAL

Prevention

Related

Related Articles

How to Self-Host Prometheus with Docker Compose

Checkmk vs Prometheus: Which Monitoring Stack?

Grafana vs Prometheus: Understanding the Stack

Netdata vs Prometheus: Monitoring Compared

Prometheus vs Netdata: Monitoring Approaches Compared

Grafana + Prometheus Stack: Complete Docker Setup

Get self-hosting tips in your inbox

Comments