Best Self-Hosted Document Management Systems

Two Different Problems

“Document management” covers two distinct use cases, and the best tool depends on which one you need:

Updated March 2026: Verified with latest Docker images and configurations.

ProblemBest ToolWhat It Does
Archive and search documentsPaperless-ngxScans, OCRs, tags, and organizes documents permanently
Edit and manipulate PDFsStirling-PDFMerge, split, convert, compress, sign PDFs on demand

These tools complement each other rather than compete. Most self-hosters who care about documents run both — Paperless-ngx as the permanent archive, Stirling-PDF as the workbench.

Paperless-ngx — Best Document Archive

Paperless-ngx is a document management system that ingests physical and digital documents, OCRs them, applies machine-learning-based categorization, and makes everything searchable. Drop a scanned receipt into the consumption folder → Paperless reads it, extracts text, suggests a title, assigns a correspondent and document type, and files it. Months later, search “electric bill 2025” and find it instantly.

The consumption pipeline is the core feature. Point Paperless at a folder (or email inbox, or IMAP mailbox) and it processes everything automatically. The ML-based classification improves over time — after you correct a few categorizations, it starts getting them right on its own.

The web interface is well-designed for browsing and searching a document archive. Filter by correspondent, document type, tag, date range, or full-text content. Preview PDFs inline. Download originals or the OCR-processed versions.

Pros:

  • Automatic OCR on every ingested document (Tesseract + optional Tika)
  • ML-based auto-classification (correspondent, document type, tags)
  • Multiple consumption sources: folder, email, IMAP
  • Full-text search across all documents
  • Clean web UI with inline PDF preview
  • Email-based document submission
  • Workflow rules for automatic processing
  • Audit trail — original files preserved alongside OCR versions
  • Active development with frequent releases

Cons:

  • Resource-heavy: requires PostgreSQL, Redis, and the app itself
  • ~500 MB RAM minimum for the full stack
  • OCR processing is CPU-intensive during ingestion
  • Initial categorization requires manual correction to train the ML
  • No PDF editing capabilities (can’t merge, split, or modify PDFs)
  • Setup is involved — several services to configure

Docker Compose:

services:
  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:2.20.11
    container_name: paperless
    ports:
      - "8000:8000"
    environment:
      - PAPERLESS_REDIS=redis://paperless-redis:6379
      - PAPERLESS_DBHOST=paperless-db
      - PAPERLESS_DBUSER=paperless
      - PAPERLESS_DBPASS=paperless_secret     # Change this
      - PAPERLESS_SECRET_KEY=change-me-long-random-string  # Change this
      - PAPERLESS_OCR_LANGUAGE=eng
      - PAPERLESS_TIME_ZONE=America/New_York
      - PAPERLESS_ADMIN_USER=admin             # Change this
      - PAPERLESS_ADMIN_PASSWORD=changeme      # Change this
    volumes:
      - paperless-data:/usr/src/paperless/data
      - paperless-media:/usr/src/paperless/media
      - paperless-export:/usr/src/paperless/export
      - paperless-consume:/usr/src/paperless/consume  # Drop files here
    depends_on:
      - paperless-db
      - paperless-redis
    restart: unless-stopped

  paperless-db:
    image: postgres:16-alpine
    container_name: paperless-db
    environment:
      - POSTGRES_USER=paperless
      - POSTGRES_PASSWORD=paperless_secret     # Match above
      - POSTGRES_DB=paperless
    volumes:
      - paperless-pgdata:/var/lib/postgresql/data
    restart: unless-stopped

  paperless-redis:
    image: redis:7-alpine
    container_name: paperless-redis
    restart: unless-stopped

volumes:
  paperless-data:
  paperless-media:
  paperless-export:
  paperless-consume:
  paperless-pgdata:

Resources: ~500 MB RAM (app + PostgreSQL + Redis). CPU spikes during OCR processing. Storage depends on document volume — plan for the size of your document archive.

[Read our full guide: How to Self-Host Paperless-ngx]

Stirling-PDF — Best PDF Toolkit

Stirling-PDF is a self-hosted PDF manipulation toolkit. Merge multiple PDFs, split a PDF into pages, convert between formats (PDF to Word, images to PDF, HTML to PDF), compress files, add watermarks, rotate pages, add/remove passwords, flatten forms, and perform OCR on scanned documents. Over 40 PDF operations in a single web interface.

It’s not a document archive — Stirling-PDF doesn’t store your files. It’s a tool you open when you need to do something to a PDF, process it, download the result, and close the tab. Think of it as a self-hosted replacement for iLovePDF, SmallPDF, or Adobe Acrobat’s online tools.

The security benefit is significant. Instead of uploading sensitive documents (tax returns, contracts, medical records) to cloud PDF services, process them on your own server. Files never leave your infrastructure.

Pros:

  • 40+ PDF operations in one tool
  • No file storage — process and download, nothing persists
  • OCR support via Tesseract
  • PDF/A conversion for archival compliance
  • Digital signature support
  • API for automation (batch processing via scripts)
  • Single container, no dependencies
  • Lightweight (~100 MB RAM)
  • Active development

Cons:

  • Not a document management system — no search, no categorization
  • Files are processed transiently — no history or audit trail
  • OCR quality depends on document scanning quality
  • Some advanced operations (e.g., PDF forms) are less polished
  • No batch upload via web UI (API-only for batch processing)

Docker Compose:

services:
  stirling-pdf:
    image: stirlingtools/stirling-pdf:2.7.3
    container_name: stirling-pdf
    ports:
      - "8080:8080"
    volumes:
      - stirling-data:/usr/share/tessdata       # OCR language data
      - stirling-configs:/configs               # Custom configurations
    environment:
      - DOCKER_ENABLE_SECURITY=false
      - LANGS=en_US                              # UI language
    restart: unless-stopped

volumes:
  stirling-data:
  stirling-configs:

Resources: ~100 MB RAM idle. CPU spikes during heavy operations (OCR, large file conversion). Minimal disk.

[Read our full guide: How to Self-Host Stirling-PDF]

Comparison

FeaturePaperless-ngxStirling-PDF
Primary purposeDocument archive + searchPDF manipulation toolkit
OCRAutomatic on ingestionOn-demand per file
Document storageYes (permanent archive)No (transient processing)
Full-text searchYesNo
Auto-categorizationYes (ML-based)No
PDF editingNoYes (40+ operations)
Format conversionNoYes (PDF ↔ Word, images, HTML)
Merge/split PDFsNoYes
Digital signaturesNoYes
APIREST APIREST API
Multi-userYes (permissions per document)Yes (basic auth)
RAM usage~500 MB~100 MB
Docker containers3 (app + PostgreSQL + Redis)1
LicenseGPL-3.0GPL-3.0

Run Both

The ideal document workflow uses both tools:

  1. Stirling-PDF to prepare documents — merge related pages, OCR scanned documents, convert formats, compress large files
  2. Paperless-ngx to archive and organize the prepared documents — auto-categorize, make searchable, store permanently

Both run on Docker and the combined overhead is manageable (~600 MB RAM total). Set up a workflow where Stirling-PDF’s output feeds into Paperless-ngx’s consumption folder for automatic processing.

Frequently Asked Questions

Can Paperless-ngx read handwritten documents?

Paperless-ngx uses Tesseract OCR, which works well on printed text but poorly on handwriting. If your handwritten documents are neat and high-contrast, some text may be recognized. For reliable handwriting recognition, you’d need to pre-process documents through a specialized service before importing into Paperless-ngx.

How do I get documents into Paperless-ngx?

Multiple ways: drop files into the consumption folder (a directory Paperless watches), email documents to a configured IMAP mailbox, use the web upload interface, or use the mobile app. Many users connect a network scanner that saves directly to the consumption folder — scan a receipt and it’s automatically OCR’d, categorized, and filed.

Is Stirling-PDF safe for sensitive documents?

Yes — that’s one of its main advantages. Stirling-PDF processes files on your server and doesn’t store them after processing. Unlike cloud services (SmallPDF, iLovePDF), your tax returns, contracts, and medical records never leave your infrastructure. The processed file is returned to your browser and then discarded.

Can I use both Paperless-ngx and Stirling-PDF together?

Yes, and this is the recommended setup. Use Stirling-PDF as a workbench to prepare documents (merge, split, compress, OCR) and then feed the results into Paperless-ngx’s consumption folder for permanent archival and search. The combined overhead is about 600 MB RAM.

How much storage does a document archive need?

It depends on your volume. A typical household generating 50-100 documents per year needs a few GB. Paperless-ngx stores both the original file and an OCR-processed version, roughly doubling storage per document. A 10-year archive of household documents (receipts, bills, medical, tax) typically fits in 10-20 GB.

Does Paperless-ngx support multiple users?

Yes — Paperless-ngx has user accounts with permissions. You can control who can view, edit, or delete documents. Documents can be assigned to specific users or shared across the instance. This works well for households where each person manages their own documents but shares some categories.

Comments