Why Ubuntu for Paperless-ngx

Ubuntu Server is the default choice for most self-hosted deployments, and Paperless-ngx runs well on it. The full stack — Paperless, PostgreSQL, and Redis — is handled entirely through Docker Compose. Ubuntu’s mature package ecosystem makes it easy to set up scanner integration, network shares, and automated backups around the container stack.

Prerequisites

Ubuntu 22.04 or 24.04 LTS server
Docker and Docker Compose installed (guide)
4 GB RAM minimum (OCR is memory-intensive)
20 GB free disk space (more for large document libraries)
A scanner that can output to a network folder (optional, for auto-import)
Root or sudo access

Install Docker

If Docker is not already installed:

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker $USER

Log out and back in for the group change to take effect.

Docker Compose Configuration

Create the project directory:

mkdir -p ~/paperless && cd ~/paperless

Create docker-compose.yml:

services:
  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:2.20.9
    container_name: paperless
    restart: unless-stopped
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    ports:
      - "8000:8000"
    environment:
      PAPERLESS_DBHOST: db
      PAPERLESS_DBPORT: 5432
      PAPERLESS_DBNAME: paperless
      PAPERLESS_DBUSER: paperless
      PAPERLESS_DBPASS: paperless_db_password  # Change this
      PAPERLESS_REDIS: redis://redis:6379
      PAPERLESS_SECRET_KEY: change-this-to-a-long-random-string  # Generate with: openssl rand -hex 32
      PAPERLESS_URL: http://localhost:8000  # Change to your domain/IP
      PAPERLESS_ADMIN_USER: admin  # Initial admin username
      PAPERLESS_ADMIN_PASSWORD: admin  # Change this -- initial admin password
      PAPERLESS_OCR_LANGUAGE: eng  # See OCR Language section below
      PAPERLESS_TIME_ZONE: America/New_York  # Your timezone
      PAPERLESS_CONSUMER_POLLING: 30  # Check consume folder every 30 seconds
      PAPERLESS_CONSUMER_RECURSIVE: "true"  # Process subdirectories in consume folder
      PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: "true"  # Create tags from subfolder names
      PAPERLESS_TASK_WORKERS: 2  # Number of OCR workers (adjust based on CPU cores)
      PAPERLESS_THREADS_PER_WORKER: 2  # Threads per OCR worker
      USERMAP_UID: 1000
      USERMAP_GID: 1000
    volumes:
      - paperless_data:/usr/src/paperless/data
      - paperless_media:/usr/src/paperless/media
      - ./consume:/usr/src/paperless/consume  # Drop PDFs here for auto-import
      - ./export:/usr/src/paperless/export    # For document exports/backups
    networks:
      - paperless-net

  db:
    image: postgres:16-alpine
    container_name: paperless-db
    restart: unless-stopped
    environment:
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless_db_password  # Must match PAPERLESS_DBPASS
      POSTGRES_DB: paperless
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "paperless"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - paperless-net

  redis:
    image: redis:7-alpine
    container_name: paperless-redis
    restart: unless-stopped
    volumes:
      - redis_data:/data
    networks:
      - paperless-net

volumes:
  paperless_data:
  paperless_media:
  postgres_data:
  redis_data:

networks:
  paperless-net:

Create the consume and export directories:

mkdir -p consume export

Start the stack:

docker compose up -d

First startup takes 1-2 minutes as the database initializes and Paperless runs migrations. Monitor progress:

docker compose logs -f paperless

Wait for Listening on 0.0.0.0:8000 before accessing the web UI.

OCR Language Configuration

Paperless-ngx uses Tesseract for OCR. The default eng (English) language is included. For additional languages:

# Single additional language
PAPERLESS_OCR_LANGUAGE: eng+deu  # English + German

# Multiple languages
PAPERLESS_OCR_LANGUAGE: eng+deu+fra+spa  # English, German, French, Spanish

Common language codes: eng (English), deu (German), fra (French), spa (Spanish), ita (Italian), por (Portuguese), nld (Dutch), jpn (Japanese), chi_sim (Simplified Chinese), kor (Korean).

The container includes all Tesseract language packs — no additional installation needed. Adding more languages slightly increases OCR processing time.

First-Time Setup

Open http://your-server-ip:8000 in your browser. Log in with the admin credentials set in PAPERLESS_ADMIN_USER and PAPERLESS_ADMIN_PASSWORD.

The first things to configure:

Upload a test document — drag and drop a PDF onto the dashboard. Paperless processes it with OCR and adds it to the library.
Create tags — organize documents by type (invoice, receipt, contract, etc.)
Create correspondents — track who sent/received documents
Create document types — categorize documents
Set up matching rules — auto-assign tags and correspondents based on document content

Consume Folder Setup

The consume folder (./consume mapped to /usr/src/paperless/consume inside the container) is the primary automation mechanism. Any PDF, PNG, JPG, or TIFF dropped into this folder is automatically imported, OCR-processed, and added to your library.

Scanner Integration

Configure your scanner to save directly to the consume folder:

Network scanner (Samba/SMB share):

sudo apt install -y samba

# Add to /etc/samba/smb.conf:
sudo tee -a /etc/samba/smb.conf > /dev/null <<EOF

[paperless-consume]
   path = /home/$USER/paperless/consume
   browseable = yes
   writable = yes
   valid users = $USER
   create mask = 0644
   directory mask = 0755
EOF

# Set Samba password
sudo smbpasswd -a $USER

# Restart Samba
sudo systemctl restart smbd

Point your scanner to \\server-ip\paperless-consume.

Subdirectories as tags: With PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: "true", you can create folders like consume/invoices/ and consume/receipts/ — documents dropped in each get tagged automatically.

inotify Watches

For large document libraries, the default inotify watch limit may be too low. Increase it:

echo "fs.inotify.max_user_watches=524288" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

This prevents “inotify watch limit reached” errors when Paperless monitors many files.

UFW Firewall Rules

# Paperless web UI
sudo ufw allow 8000/tcp comment 'Paperless-ngx'

# If using Samba for scanner integration
sudo ufw allow samba comment 'Samba for Paperless consume'

# If behind a reverse proxy
sudo ufw allow 443/tcp comment 'HTTPS'

sudo ufw status

Backup Strategy

Paperless stores data in three locations that all need backing up:

PostgreSQL database — document metadata, tags, correspondents, matching rules
Media volume — original documents and OCR-processed versions
Data volume — thumbnails and search index

Using Paperless Built-in Export

Paperless has a built-in document exporter that creates a portable backup:

docker exec paperless document_exporter /usr/src/paperless/export

This writes all documents and metadata to the ./export directory. Schedule it with cron:

# Daily export at 1 AM
0 1 * * * docker exec paperless document_exporter /usr/src/paperless/export

Database Backup

#!/bin/bash
# backup-paperless.sh
BACKUP_DIR="/opt/backups/paperless/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"

# Database dump
docker exec paperless-db pg_dump -U paperless paperless > "$BACKUP_DIR/paperless-db.sql"

# Media files
docker run --rm \
  -v paperless_paperless_media:/media \
  -v "$BACKUP_DIR":/backup \
  alpine tar czf /backup/paperless-media.tar.gz /media

echo "Backup complete: $BACKUP_DIR"

See Backup Strategy for the full 3-2-1 approach.

HTTPS via Reverse Proxy

For remote access, put Paperless behind a reverse proxy with SSL. Update the environment variable:

PAPERLESS_URL: https://docs.yourdomain.com

Then configure your reverse proxy to forward to localhost:8000. See Reverse Proxy Setup.

Important: Paperless uses WebSocket connections for real-time task updates. Your reverse proxy must support WebSocket proxying. Nginx Proxy Manager handles this by default. For raw Nginx, add:

proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

Troubleshooting

OCR Produces Garbled Text

Symptom: Document content is recognized but text is wrong or mixed characters.

Fix: Set the correct OCR language. If your documents are in German, PAPERLESS_OCR_LANGUAGE: eng will produce garbage. Set it to deu or eng+deu for mixed-language documents.

Consume Folder Not Processing Files

Symptom: Files sit in the consume folder and are not imported.

Fix: Check permissions. The container runs as UID/GID set by USERMAP_UID/USERMAP_GID. The consume folder must be writable by that user:

sudo chown -R 1000:1000 ~/paperless/consume

Also check PAPERLESS_CONSUMER_POLLING is set and the container logs for errors:

docker compose logs paperless | grep -i consume

Database Connection Errors on Startup

Symptom: Paperless logs show could not connect to server: Connection refused.

Fix: PostgreSQL may not be ready. The depends_on with health check should handle this, but if it persists, increase the retry count in the PostgreSQL healthcheck or add a startup delay:

paperless:
  depends_on:
    db:
      condition: service_healthy

Verify the database is healthy:

docker compose ps

High Memory Usage During OCR

Symptom: System becomes unresponsive during document processing. OOM killer terminates processes.

Fix: Reduce the number of OCR workers:

PAPERLESS_TASK_WORKERS: 1
PAPERLESS_THREADS_PER_WORKER: 1

Each worker uses 300-500 MB during OCR processing. On a 4 GB system, 2 workers is the safe maximum.

Search Not Finding Documents

Symptom: Documents exist but full-text search returns no results.

Fix: The search index may need rebuilding:

docker exec paperless document_index reindex

This can take a while for large libraries. Check progress in the Paperless web UI under Tasks.

Resource Requirements

RAM: ~500 MB idle, 1-2 GB during active OCR processing (per worker)
CPU: Low idle. OCR processing is CPU-intensive — each page takes 2-10 seconds on modern x86 hardware
Disk: 500 MB for the application, plus document storage (plan 1-5 MB per page depending on originals)

Install Paperless-ngx on Ubuntu Server

Why Ubuntu for Paperless-ngx

Prerequisites

Install Docker

Docker Compose Configuration

OCR Language Configuration

First-Time Setup

Consume Folder Setup

Scanner Integration

inotify Watches

UFW Firewall Rules

Backup Strategy

Using Paperless Built-in Export

Database Backup

HTTPS via Reverse Proxy

Troubleshooting

OCR Produces Garbled Text

Consume Folder Not Processing Files

Database Connection Errors on Startup

High Memory Usage During OCR

Search Not Finding Documents

Resource Requirements

Comments

Why Ubuntu for Paperless-ngx

Prerequisites

Install Docker

Docker Compose Configuration

OCR Language Configuration

First-Time Setup

Consume Folder Setup

Scanner Integration

inotify Watches

UFW Firewall Rules

Backup Strategy

Using Paperless Built-in Export

Database Backup

HTTPS via Reverse Proxy

Troubleshooting

OCR Produces Garbled Text

Consume Folder Not Processing Files

Database Connection Errors on Startup

High Memory Usage During OCR

Search Not Finding Documents

Resource Requirements

Related

Related Articles

How to Self-Host Paperless-ngx with Docker

Install Paperless-ngx on Proxmox VE

Install Paperless-ngx on Raspberry Pi

Best Self-Hosted Document Management Systems

Paperless-ngx vs Stirling-PDF: Which to Use?

How to Self-Host Stirling-PDF with Docker

Get self-hosting tips in your inbox

Comments