How to Self-Host Paperless-ngx with Docker
What Is Paperless-ngx?
Paperless-ngx is a self-hosted document management system that turns your physical and digital documents into a searchable online archive. Scan a document, drop the PDF into a folder, and Paperless-ngx automatically OCRs it, applies tags, assigns correspondents, and makes every word searchable. It replaces filing cabinets, Google Drive document dumps, and paid services like Adobe Acrobat’s document management.
Updated March 2026: Verified with latest Docker images and configurations.
Paperless-ngx is the community-maintained successor to paperless and paperless-ng, and is by far the most active fork.
Prerequisites
- A Linux server (Ubuntu 22.04+ recommended)
- Docker and Docker Compose installed (guide)
- 2 GB of free RAM (4 GB recommended for OCR processing)
- 10 GB of free disk space (plus storage for documents)
- A domain name (optional, for remote access)
Docker Compose Configuration
Create a docker-compose.yml file:
services:
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:2.20.11
container_name: paperless
restart: unless-stopped
ports:
- "8000:8000"
environment:
PAPERLESS_REDIS: "redis://paperless_redis:6379"
PAPERLESS_DBHOST: "paperless_db"
PAPERLESS_DBNAME: "paperless"
PAPERLESS_DBUSER: "paperless"
PAPERLESS_DBPASS: "change_this_strong_password" # Must match PostgreSQL password
PAPERLESS_SECRET_KEY: "change_this_to_a_long_random_string" # CHANGE THIS — generate with: openssl rand -hex 32
PAPERLESS_URL: "http://localhost:8000" # Set to your public URL
PAPERLESS_TIME_ZONE: "America/New_York" # Your timezone
PAPERLESS_OCR_LANGUAGE: "eng" # OCR language (eng, deu, fra, etc.)
PAPERLESS_ADMIN_USER: "admin" # Superuser username
PAPERLESS_ADMIN_PASSWORD: "change_this_password" # Superuser password — CHANGE THIS
PAPERLESS_ADMIN_MAIL: "[email protected]"
USERMAP_UID: "1000"
USERMAP_GID: "1000"
volumes:
- paperless_data:/usr/src/paperless/data
- paperless_media:/usr/src/paperless/media
- paperless_export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume # Drop PDFs here for auto-import
depends_on:
paperless_db:
condition: service_healthy
paperless_redis:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
paperless_db:
image: postgres:16-alpine
container_name: paperless_db
restart: unless-stopped
environment:
POSTGRES_USER: paperless
POSTGRES_PASSWORD: change_this_strong_password # Must match PAPERLESS_DBPASS
POSTGRES_DB: paperless
volumes:
- paperless_pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U paperless"]
interval: 10s
timeout: 5s
retries: 5
paperless_redis:
image: redis:7-alpine
container_name: paperless_redis
restart: unless-stopped
volumes:
- paperless_redis:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
volumes:
paperless_data:
paperless_media:
paperless_export:
paperless_pgdata:
paperless_redis:
Create the consume directory and start:
mkdir -p consume
docker compose up -d
Initial Setup
- Wait 30-60 seconds for the database migrations and OCR model download to complete
- Open
http://your-server-ip:8000in your browser - Log in with the
PAPERLESS_ADMIN_USERandPAPERLESS_ADMIN_PASSWORDfrom your Compose file - After first login, remove
PAPERLESS_ADMIN_USER,PAPERLESS_ADMIN_PASSWORD, andPAPERLESS_ADMIN_MAILfrom your Compose file — they only apply on first run
Adding Documents
Three ways to add documents:
- Consume folder: Drop PDFs or images into the
./consumedirectory. Paperless-ngx picks them up automatically. - Web UI: Click the upload button in the top-right corner
- API: POST to
/api/documents/post_document/
Configuration
OCR Languages
Install additional OCR languages by setting PAPERLESS_OCR_LANGUAGE:
PAPERLESS_OCR_LANGUAGE: "eng+deu+fra" # English, German, French
Available languages follow the Tesseract language codes. For additional languages beyond the default set, add:
PAPERLESS_OCR_LANGUAGES: "chi-sim kor jpn" # Download Chinese, Korean, Japanese models
Automatic Matching
Paperless-ngx can automatically assign tags, correspondents, and document types using:
- Matching algorithms: Exact, fuzzy, regex, or auto
- Machine learning: Automatic classification based on your existing categorization patterns
Configure under Settings → Matching in the web UI.
Email Consumption
Paperless-ngx can fetch documents from email accounts:
- Go to Settings → Mail Accounts
- Add your IMAP email account
- Create a Mail Rule to process attachments from specific senders or subjects
SMTP Notifications
PAPERLESS_EMAIL_HOST: "smtp.example.com"
PAPERLESS_EMAIL_PORT: "587"
PAPERLESS_EMAIL_HOST_USER: "[email protected]"
PAPERLESS_EMAIL_HOST_PASSWORD: "your-email-password"
PAPERLESS_EMAIL_USE_TLS: "true"
Advanced Configuration (Optional)
Tika and Gotenberg (Office Document Support)
To process Word, Excel, PowerPoint, and other office formats, add Tika and Gotenberg:
tika:
image: apache/tika:3.1.0.0
container_name: paperless_tika
restart: unless-stopped
gotenberg:
image: gotenberg/gotenberg:8.27.0
container_name: paperless_gotenberg
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
Add these environment variables to the paperless service:
PAPERLESS_TIKA_ENABLED: "true"
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: "http://gotenberg:3000"
PAPERLESS_TIKA_ENDPOINT: "http://tika:9998"
Barcode Separation
Paperless-ngx can split multi-page scans into separate documents using barcode separator pages:
PAPERLESS_CONSUMER_ENABLE_BARCODES: "true"
PAPERLESS_CONSUMER_BARCODE_SCANNER: "ZXING"
Print the barcode separator page from the docs, insert between documents when scanning, and Paperless splits them automatically.
Reverse Proxy
Set PAPERLESS_URL to your public-facing domain:
PAPERLESS_URL: "https://docs.example.com"
Nginx Proxy Manager config:
- Scheme: http
- Forward Hostname: paperless
- Forward Port: 8000
- Enable WebSocket Support: Yes
See Reverse Proxy Setup for full configuration.
Backup
Built-in Export
Paperless-ngx has a built-in document exporter:
docker compose exec paperless document_exporter ../export --zip
This creates a complete backup including documents, metadata, and database in the export volume.
Database Backup
docker compose exec paperless_db pg_dump -U paperless paperless > paperless-backup-$(date +%Y%m%d).sql
Back up the paperless_media volume separately — it contains your original documents and OCR’d versions.
See Backup Strategy for a complete backup approach.
Troubleshooting
OCR Fails on Scanned Documents
Symptom: Documents imported but text not searchable, OCR status shows errors.
Fix: Check that the correct OCR language is set. For scanned documents with mixed languages, use PAPERLESS_OCR_LANGUAGE: "eng+deu". For very low-quality scans, try:
PAPERLESS_OCR_DESKEW: "true"
PAPERLESS_OCR_ROTATE_PAGES: "true"
PAPERLESS_OCR_IMAGE_DPI: "300"
Consume Folder Not Processing Files
Symptom: Files sit in the consume folder and are never picked up.
Fix: Check file permissions. The consumer runs as USERMAP_UID:USERMAP_GID (default 1000:1000). The consume directory and files must be readable by this user:
chown -R 1000:1000 consume/
High Memory Usage During OCR
Symptom: Container OOM-killed during processing of large documents. Fix: Limit concurrent processing:
PAPERLESS_TASK_WORKERS: "1" # Default is 1, don't increase on low-RAM systems
PAPERLESS_THREADS_PER_WORKER: "1" # Default is auto-detected
For a 2 GB RAM system, keep both at 1. Each worker can use 500 MB+ during OCR.
Duplicate Documents Detected Incorrectly
Symptom: Paperless refuses to import documents, claiming they’re duplicates. Fix: Paperless uses MD5 hashing to detect duplicates. If you need to re-import a document, either delete the original first or disable duplicate checking:
PAPERLESS_CONSUMER_ENABLE_ASN: "false"
Resource Requirements
- RAM: ~300 MB idle, 1-2 GB during OCR processing (per worker)
- CPU: Medium — OCR is CPU-intensive. An Intel N100 handles it fine but slowly. Faster CPUs = faster OCR.
- Disk: ~500 MB for the application, plus storage for your document archive
Verdict
Paperless-ngx is the gold standard for self-hosted document management. Nothing else comes close in terms of features, community activity, and reliability. The OCR is accurate, the automatic tagging saves hours of manual work, and the search is excellent. Pair it with a scanner or use the email consumption feature to go fully paperless. If you just need a simple PDF viewer/editor without OCR and tagging, look at Stirling-PDF. But for actual document management, Paperless-ngx is the only serious option.
Frequently Asked Questions
What scanners work with Paperless-ngx?
Any scanner that can output PDF or image files works. The most popular options are Brother ADS scanners (ADS-1700W, ADS-2700W) and Fujitsu ScanSnap models (iX1600, iX1400) because they can scan directly to a network folder — which you point at Paperless-ngx’s consumption directory. Flatbed scanners work too; scan to a file and drop it in the consumption folder. There is no direct scanner integration — Paperless-ngx watches a folder, not a scanner protocol.
How accurate is the OCR?
Paperless-ngx uses Tesseract OCR, which handles printed text in major languages with 95-99% accuracy. Handwritten text accuracy is poor — expect 50-70% at best. For best results, scan at 300 DPI minimum and ensure good contrast. OCR runs automatically when documents are consumed. You can configure the OCR language in PAPERLESS_OCR_LANGUAGE (e.g., eng+deu for English and German). Pre-printed forms, invoices, and typed letters are where Paperless-ngx shines.
Can I import existing documents?
Yes. Drop PDF, PNG, JPG, TIFF, or other supported files into the consumption directory and Paperless-ngx processes them automatically — OCR, tagging, and indexing. For bulk imports of thousands of documents, use the document_importer management command which is faster than the consumption folder. Existing PDFs with embedded text are detected and the text layer is preserved without re-OCR.
How does automatic tagging work?
Paperless-ngx uses matching algorithms to auto-assign tags, correspondents, and document types. You define rules based on: exact text matching, regular expressions, fuzzy matching, or the newer machine learning classifier. For example, create a tag “Electric Bill” with the match text “electric company name” — any document containing that text gets tagged automatically. The ML classifier learns from your manual tagging over time and gets more accurate as you use it.
How much disk space do I need?
Plan for roughly 1-3 MB per document page (PDF with OCR text layer). A household generating 50 documents per month uses about 1-3 GB per year. The PostgreSQL database adds minimal overhead. The biggest space consumer is the originals archive — Paperless-ngx stores both the original file and an archive version with embedded OCR text. You can disable archive storage with PAPERLESS_OCR_MODE=skip if disk space is tight, but keeping archives is recommended.
Can multiple users access Paperless-ngx?
Yes. Paperless-ngx has built-in multi-user support with permission controls. Each user can have their own documents, or documents can be shared. Permissions can be set per document, tag, correspondent, and document type. An admin user manages all documents and settings. This works well for households where each person manages their own paperwork.
Is there a mobile app?
There is no official mobile app, but the web interface is fully responsive and works well on phones. Several community-built mobile apps exist, including Paperless Mobile (Android/iOS) which provides a native interface for browsing, searching, and uploading documents. You can also use the Share menu on your phone to upload photos of documents directly to Paperless-ngx’s API.
Related
- Paperless-ngx: Consumption Folder Not Processing — Fix
- Install Paperless-ngx on Ubuntu Server
- Install Paperless-ngx on Raspberry Pi
- Install Paperless-ngx on Proxmox VE
- Paperless-ngx vs Docspell: Document Management Compared
- Paperless-ngx vs Teedy: Document Management Compared
- Best Self-Hosted Document Management
- Paperless-ngx vs Stirling-PDF
- Replace Adobe Acrobat with Self-Hosted Tools
- Docker Compose Basics
- Reverse Proxy Setup
- Backup Strategy
- Docker Volumes Explained
Get self-hosting tips in your inbox
Get the Docker Compose configs, hardware picks, and setup shortcuts we don't put in articles. Weekly. No spam.
Comments