Install Paperless-ngx on Ubuntu Server

Why Ubuntu for Paperless-ngx

Ubuntu Server is the default choice for most self-hosted deployments, and Paperless-ngx runs well on it. The full stack — Paperless, PostgreSQL, and Redis — is handled entirely through Docker Compose. Ubuntu’s mature package ecosystem makes it easy to set up scanner integration, network shares, and automated backups around the container stack.

Prerequisites

  • Ubuntu 22.04 or 24.04 LTS server
  • Docker and Docker Compose installed (guide)
  • 4 GB RAM minimum (OCR is memory-intensive)
  • 20 GB free disk space (more for large document libraries)
  • A scanner that can output to a network folder (optional, for auto-import)
  • Root or sudo access

Install Docker

If Docker is not already installed:

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker $USER

Log out and back in for the group change to take effect.

Docker Compose Configuration

Create the project directory:

mkdir -p ~/paperless && cd ~/paperless

Create docker-compose.yml:

services:
  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:2.20.9
    container_name: paperless
    restart: unless-stopped
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    ports:
      - "8000:8000"
    environment:
      PAPERLESS_DBHOST: db
      PAPERLESS_DBPORT: 5432
      PAPERLESS_DBNAME: paperless
      PAPERLESS_DBUSER: paperless
      PAPERLESS_DBPASS: paperless_db_password  # Change this
      PAPERLESS_REDIS: redis://redis:6379
      PAPERLESS_SECRET_KEY: change-this-to-a-long-random-string  # Generate with: openssl rand -hex 32
      PAPERLESS_URL: http://localhost:8000  # Change to your domain/IP
      PAPERLESS_ADMIN_USER: admin  # Initial admin username
      PAPERLESS_ADMIN_PASSWORD: admin  # Change this -- initial admin password
      PAPERLESS_OCR_LANGUAGE: eng  # See OCR Language section below
      PAPERLESS_TIME_ZONE: America/New_York  # Your timezone
      PAPERLESS_CONSUMER_POLLING: 30  # Check consume folder every 30 seconds
      PAPERLESS_CONSUMER_RECURSIVE: "true"  # Process subdirectories in consume folder
      PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: "true"  # Create tags from subfolder names
      PAPERLESS_TASK_WORKERS: 2  # Number of OCR workers (adjust based on CPU cores)
      PAPERLESS_THREADS_PER_WORKER: 2  # Threads per OCR worker
      USERMAP_UID: 1000
      USERMAP_GID: 1000
    volumes:
      - paperless_data:/usr/src/paperless/data
      - paperless_media:/usr/src/paperless/media
      - ./consume:/usr/src/paperless/consume  # Drop PDFs here for auto-import
      - ./export:/usr/src/paperless/export    # For document exports/backups
    networks:
      - paperless-net

  db:
    image: postgres:16-alpine
    container_name: paperless-db
    restart: unless-stopped
    environment:
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless_db_password  # Must match PAPERLESS_DBPASS
      POSTGRES_DB: paperless
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "paperless"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - paperless-net

  redis:
    image: redis:7-alpine
    container_name: paperless-redis
    restart: unless-stopped
    volumes:
      - redis_data:/data
    networks:
      - paperless-net

volumes:
  paperless_data:
  paperless_media:
  postgres_data:
  redis_data:

networks:
  paperless-net:

Create the consume and export directories:

mkdir -p consume export

Start the stack:

docker compose up -d

First startup takes 1-2 minutes as the database initializes and Paperless runs migrations. Monitor progress:

docker compose logs -f paperless

Wait for Listening on 0.0.0.0:8000 before accessing the web UI.

OCR Language Configuration

Paperless-ngx uses Tesseract for OCR. The default eng (English) language is included. For additional languages:

# Single additional language
PAPERLESS_OCR_LANGUAGE: eng+deu  # English + German

# Multiple languages
PAPERLESS_OCR_LANGUAGE: eng+deu+fra+spa  # English, German, French, Spanish

Common language codes: eng (English), deu (German), fra (French), spa (Spanish), ita (Italian), por (Portuguese), nld (Dutch), jpn (Japanese), chi_sim (Simplified Chinese), kor (Korean).

The container includes all Tesseract language packs — no additional installation needed. Adding more languages slightly increases OCR processing time.

First-Time Setup

Open http://your-server-ip:8000 in your browser. Log in with the admin credentials set in PAPERLESS_ADMIN_USER and PAPERLESS_ADMIN_PASSWORD.

The first things to configure:

  1. Upload a test document — drag and drop a PDF onto the dashboard. Paperless processes it with OCR and adds it to the library.
  2. Create tags — organize documents by type (invoice, receipt, contract, etc.)
  3. Create correspondents — track who sent/received documents
  4. Create document types — categorize documents
  5. Set up matching rules — auto-assign tags and correspondents based on document content

Consume Folder Setup

The consume folder (./consume mapped to /usr/src/paperless/consume inside the container) is the primary automation mechanism. Any PDF, PNG, JPG, or TIFF dropped into this folder is automatically imported, OCR-processed, and added to your library.

Scanner Integration

Configure your scanner to save directly to the consume folder:

Network scanner (Samba/SMB share):

sudo apt install -y samba

# Add to /etc/samba/smb.conf:
sudo tee -a /etc/samba/smb.conf > /dev/null <<EOF

[paperless-consume]
   path = /home/$USER/paperless/consume
   browseable = yes
   writable = yes
   valid users = $USER
   create mask = 0644
   directory mask = 0755
EOF

# Set Samba password
sudo smbpasswd -a $USER

# Restart Samba
sudo systemctl restart smbd

Point your scanner to \\server-ip\paperless-consume.

Subdirectories as tags: With PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: "true", you can create folders like consume/invoices/ and consume/receipts/ — documents dropped in each get tagged automatically.

inotify Watches

For large document libraries, the default inotify watch limit may be too low. Increase it:

echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

This prevents “inotify watch limit reached” errors when Paperless monitors many files.

UFW Firewall Rules

# Paperless web UI
sudo ufw allow 8000/tcp comment 'Paperless-ngx'

# If using Samba for scanner integration
sudo ufw allow samba comment 'Samba for Paperless consume'

# If behind a reverse proxy
sudo ufw allow 443/tcp comment 'HTTPS'

sudo ufw status

Backup Strategy

Paperless stores data in three locations that all need backing up:

  1. PostgreSQL database — document metadata, tags, correspondents, matching rules
  2. Media volume — original documents and OCR-processed versions
  3. Data volume — thumbnails and search index

Using Paperless Built-in Export

Paperless has a built-in document exporter that creates a portable backup:

docker exec paperless document_exporter /usr/src/paperless/export

This writes all documents and metadata to the ./export directory. Schedule it with cron:

# Daily export at 1 AM
0 1 * * * docker exec paperless document_exporter /usr/src/paperless/export

Database Backup

#!/bin/bash
# backup-paperless.sh
BACKUP_DIR="/opt/backups/paperless/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"

# Database dump
docker exec paperless-db pg_dump -U paperless paperless > "$BACKUP_DIR/paperless-db.sql"

# Media files
docker run --rm \
  -v paperless_paperless_media:/media \
  -v "$BACKUP_DIR":/backup \
  alpine tar czf /backup/paperless-media.tar.gz /media

echo "Backup complete: $BACKUP_DIR"

See Backup Strategy for the full 3-2-1 approach.

HTTPS via Reverse Proxy

For remote access, put Paperless behind a reverse proxy with SSL. Update the environment variable:

PAPERLESS_URL: https://docs.yourdomain.com

Then configure your reverse proxy to forward to localhost:8000. See Reverse Proxy Setup.

Important: Paperless uses WebSocket connections for real-time task updates. Your reverse proxy must support WebSocket proxying. Nginx Proxy Manager handles this by default. For raw Nginx, add:

proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

Troubleshooting

OCR Produces Garbled Text

Symptom: Document content is recognized but text is wrong or mixed characters.

Fix: Set the correct OCR language. If your documents are in German, PAPERLESS_OCR_LANGUAGE: eng will produce garbage. Set it to deu or eng+deu for mixed-language documents.

Consume Folder Not Processing Files

Symptom: Files sit in the consume folder and are not imported.

Fix: Check permissions. The container runs as UID/GID set by USERMAP_UID/USERMAP_GID. The consume folder must be writable by that user:

sudo chown -R 1000:1000 ~/paperless/consume

Also check PAPERLESS_CONSUMER_POLLING is set and the container logs for errors:

docker compose logs paperless | grep -i consume

Database Connection Errors on Startup

Symptom: Paperless logs show could not connect to server: Connection refused.

Fix: PostgreSQL may not be ready. The depends_on with health check should handle this, but if it persists, increase the retry count in the PostgreSQL healthcheck or add a startup delay:

paperless:
  depends_on:
    db:
      condition: service_healthy

Verify the database is healthy:

docker compose ps

High Memory Usage During OCR

Symptom: System becomes unresponsive during document processing. OOM killer terminates processes.

Fix: Reduce the number of OCR workers:

PAPERLESS_TASK_WORKERS: 1
PAPERLESS_THREADS_PER_WORKER: 1

Each worker uses 300-500 MB during OCR processing. On a 4 GB system, 2 workers is the safe maximum.

Search Not Finding Documents

Symptom: Documents exist but full-text search returns no results.

Fix: The search index may need rebuilding:

docker exec paperless document_index reindex

This can take a while for large libraries. Check progress in the Paperless web UI under Tasks.

Resource Requirements

  • RAM: ~500 MB idle, 1-2 GB during active OCR processing (per worker)
  • CPU: Low idle. OCR processing is CPU-intensive — each page takes 2-10 seconds on modern x86 hardware
  • Disk: 500 MB for the application, plus document storage (plan 1-5 MB per page depending on originals)

Comments