Grafana + Prometheus Stack: Complete Docker Setup

What Is the Grafana + Prometheus Stack?

Grafana and Prometheus together form the most popular open-source monitoring stack in the self-hosting world. Prometheus scrapes and stores time-series metrics. Grafana visualizes them with dashboards. Add node_exporter for host metrics, cAdvisor for Docker container metrics, and Alertmanager for notifications — and you have enterprise-grade monitoring for free.

Updated March 2026: Verified with latest Docker images and configurations.

This guide deploys the complete stack with a single docker compose up -d.

Prerequisites

  • A Linux server (Ubuntu 22.04+ recommended)
  • Docker and Docker Compose installed (guide)
  • 2 GB of free RAM (minimum for all 5 services)
  • 10 GB of free disk space (metrics storage)
  • Basic understanding of Docker networking

Architecture Overview

ComponentRolePort
PrometheusMetrics collection and storage9090
GrafanaDashboard visualization3000
node_exporterHost system metrics (CPU, RAM, disk)9100
cAdvisorDocker container metrics8080
AlertmanagerAlert routing and notifications9093

Data flows in one direction: exporters → Prometheus → Grafana. Prometheus scrapes metrics from exporters at regular intervals and stores them. Grafana queries Prometheus to render dashboards. Alertmanager receives firing alerts from Prometheus and sends notifications.

Docker Compose Configuration

Create a project directory with the following structure:

mkdir -p monitoring/{prometheus,grafana/provisioning/datasources,alertmanager}
cd monitoring

Create docker-compose.yml:

services:
  prometheus:
    image: prom/prometheus:v3.10.0
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules.yml:/etc/prometheus/rules.yml:ro
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    networks:
      - monitoring
    depends_on:
      - node-exporter
      - cadvisor

  grafana:
    image: grafana/grafana:12.4.1
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    environment:
      GF_SECURITY_ADMIN_USER: admin                    # Default admin username
      GF_SECURITY_ADMIN_PASSWORD: changeme             # CHANGE THIS
      GF_SERVER_ROOT_URL: http://localhost:3000         # Set to your domain if using reverse proxy
      GF_INSTALL_PLUGINS: grafana-clock-panel           # Optional plugins
    networks:
      - monitoring
    depends_on:
      - prometheus

  node-exporter:
    image: prom/node-exporter:v1.10.2
    container_name: node-exporter
    restart: unless-stopped
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.51.0
    container_name: cadvisor
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    privileged: true
    devices:
      - /dev/kmsg:/dev/kmsg
    networks:
      - monitoring

  alertmanager:
    image: prom/alertmanager:v0.31.1
    container_name: alertmanager
    restart: unless-stopped
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
      - alertmanager-data:/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus-data:
  grafana-data:
  alertmanager-data:

Prometheus Configuration

Create prometheus/prometheus.yml:

global:
  scrape_interval: 15s        # How often to scrape targets
  evaluation_interval: 15s     # How often to evaluate alert rules
  scrape_timeout: 10s          # Timeout per scrape

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - rules.yml

scrape_configs:
  # Monitor Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']

  # Host system metrics via node_exporter
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  # Docker container metrics via cAdvisor
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  # Grafana metrics
  - job_name: 'grafana'
    static_configs:
      - targets: ['grafana:3000']

Create prometheus/rules.yml for alerting rules:

groups:
  - name: system-alerts
    rules:
      # Alert when CPU usage exceeds 80% for 5 minutes
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% for more than 5 minutes."

      # Alert when available memory drops below 15%
      - alert: LowMemory
        expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100) < 15
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low memory on {{ $labels.instance }}"
          description: "Available memory is below 15%."

      # Alert when disk usage exceeds 85%
      - alert: HighDiskUsage
        expr: (1 - node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"}) * 100 > 85
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk almost full on {{ $labels.instance }}"
          description: "Disk usage exceeds 85% on {{ $labels.mountpoint }}."

      # Alert when a Docker container is down
      - alert: ContainerDown
        expr: absent(container_last_seen{name!=""})
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Container {{ $labels.name }} is down"

  - name: service-alerts
    rules:
      # Alert when Prometheus target is down
      - alert: TargetDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Target {{ $labels.instance }} is down"
          description: "{{ $labels.job }} target {{ $labels.instance }} has been unreachable for 2 minutes."

Alertmanager Configuration

Create alertmanager/alertmanager.yml:

global:
  resolve_timeout: 5m
  # Uncomment and configure for email alerts:
  # smtp_smarthost: 'smtp.gmail.com:587'
  # smtp_from: '[email protected]'
  # smtp_auth_username: '[email protected]'
  # smtp_auth_password: 'app-specific-password'
  # smtp_require_tls: true

templates: []

route:
  receiver: 'default'
  group_by: ['alertname', 'instance']
  group_wait: 30s         # Wait before sending first notification
  group_interval: 5m      # Interval between grouped notifications
  repeat_interval: 4h     # Resend if alert still firing

receivers:
  - name: 'default'
    # Email notifications (uncomment global SMTP settings above):
    # email_configs:
    #   - to: '[email protected]'
    #     send_resolved: true

    # Slack notifications:
    # slack_configs:
    #   - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    #     channel: '#alerts'
    #     send_resolved: true
    #     title: '{{ .GroupLabels.alertname }}'
    #     text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

    # Discord notifications (via webhook):
    # discord_configs:
    #   - webhook_url: 'https://discord.com/api/webhooks/YOUR/WEBHOOK'
    #     send_resolved: true

Grafana Datasource Provisioning

Create grafana/provisioning/datasources/prometheus.yml:

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

This automatically configures Prometheus as the default datasource when Grafana starts — no manual setup needed.

Start the Stack

docker compose up -d

Verify all services are running:

docker compose ps
ServiceURLPurpose
Prometheushttp://your-server:9090Query metrics, check targets
Grafanahttp://your-server:3000Dashboards
node_exporterhttp://your-server:9100/metricsRaw host metrics
cAdvisorhttp://your-server:8080Container metrics UI
Alertmanagerhttp://your-server:9093Alert status and silences

Initial Grafana Setup

  1. Open Grafana at http://your-server:3000
  2. Log in with admin / changeme (change the password on first login)
  3. The Prometheus datasource is already configured via provisioning
  4. Import community dashboards:
    • Go to Dashboards → Import
    • Enter dashboard ID 1860 for “Node Exporter Full” (host metrics)
    • Enter dashboard ID 193 for “Docker Monitoring” (container metrics)
    • Select “Prometheus” as the datasource

These two dashboards give you immediate visibility into host system health and Docker container resource usage.

Adding More Scrape Targets

To monitor additional services that expose Prometheus metrics, add entries to prometheus/prometheus.yml:

scrape_configs:
  # ... existing configs ...

  # Example: Monitor your Nextcloud instance
  - job_name: 'nextcloud'
    static_configs:
      - targets: ['nextcloud:9090']
    metrics_path: '/ocs/v2.php/apps/serverinfo/api/v1/info'
    params:
      format: ['prometheus']

  # Example: Monitor another host via node_exporter
  - job_name: 'remote-server'
    static_configs:
      - targets: ['192.168.1.100:9100']

After editing, reload Prometheus without restarting:

curl -X POST http://localhost:9090/-/reload

Data Retention and Storage

SettingDefaultRecommended
Retention time15 days30-90 days
Retention sizeUnlimitedSet based on disk capacity
Disk usage per day~50-200 MBDepends on scrape targets and interval

Adjust retention in the Prometheus command section:

command:
  - '--storage.tsdb.retention.time=90d'        # Keep 90 days
  - '--storage.tsdb.retention.size=10GB'        # Or cap at 10 GB

Common Mistakes

  1. Not exposing host filesystems to node_exporter. Without /proc, /sys, and / mounted read-only, node_exporter reports container metrics instead of host metrics.

  2. Using localhost in scrape configs. Inside Docker, services reference each other by container name (prometheus, grafana), not localhost.

  3. Forgetting to reload Prometheus after config changes. Use curl -X POST http://localhost:9090/-/reload or restart the container.

  4. Setting scrape intervals too low. A 5-second interval generates massive data volumes. Start with 15 seconds and lower only if needed.

  5. Not setting retention limits. Without retention.time or retention.size, Prometheus will fill your disk.

Resource Requirements

ServiceRAM (idle)RAM (load)CPU
Prometheus200 MB500 MB+Low-Medium
Grafana100 MB300 MBLow
node_exporter10 MB20 MBVery Low
cAdvisor50 MB100 MBLow
Alertmanager30 MB50 MBVery Low
Total~400 MB~1 GBLow

Next Steps

  • Add Loki for log aggregation alongside metrics
  • Set up Grafana alerts for unified alerting
  • Monitor remote servers by running node_exporter on each host
  • Create custom dashboards for your specific self-hosted applications

Comments