Stirling-PDF: Conversion Errors — Fix

The Problem

Stirling-PDF handles dozens of PDF operations — conversions, OCR, merging, splitting — and each one can fail in distinct ways. The most common failures fall into four categories: LibreOffice connection errors (Office format conversions), OCR language detection failures, out-of-memory crashes on large PDFs, and missing features from using the wrong Docker image variant.

Stirling-PDF Image Variants

Before troubleshooting, verify you’re running the right image. Many issues trace back to using an image that doesn’t include the tool you need.

Image TagLibreOfficeTesseract OCRExtra FontsSizeUse Case
latestYesYesBasic~1.5 GBRecommended for most users
latest-fatYesYesComprehensive~2 GBMultilingual documents, CJK fonts
latest-ultra-liteNoReducedBasic~400 MBMerge/split only, resource-constrained

If you need Office conversions (PDF to Word, Excel, PowerPoint), you must use latest or latest-fat. The ultra-lite image does not include LibreOffice.

Fix 1: LibreOffice Connection Refused

Error message:

Connector : couldn't connect to socket (Connection refused) at
/home/buildozer/aports/community/libreoffice/src/libreoffice-7.6.7.2/
io/source/connector/connector.cxx:118
Failed to start servers
Exit code: 2

Affected operations: PDF to Word, PDF to PowerPoint, PDF to Excel — any conversion that requires LibreOffice.

Cause: The internal unoserver process (which wraps LibreOffice) fails to start its socket listener. This commonly happens in restricted Docker environments, with certain container runtimes, or after upgrading past v0.41.0.

Fix 1a — Increase container resources:

LibreOffice needs memory to start. If your container has a memory limit, raise it:

services:
  stirling-pdf:
    image: stirlingtools/stirling-pdf:0.45.3
    container_name: stirling-pdf
    ports:
      - "8080:8080"
    volumes:
      - ./data:/usr/share/tessdata
      - ./config:/configs
    deploy:
      resources:
        limits:
          memory: 4G
    restart: unless-stopped

Fix 1b — Check the image variant:

docker exec stirling-pdf which libreoffice

If it returns nothing, you’re running ultra-lite. Switch to latest:

docker pull stirlingtools/stirling-pdf:0.45.3

Fix 1c — Downgrade if the issue started after an upgrade:

If conversions worked on a previous version and broke after updating, try the last known-good version:

image: stirlingtools/stirling-pdf:0.41.0

Version 0.42.0 introduced unoserver changes that cause connection failures in some environments. Check the GitHub releases for fix notes before upgrading again.

Fix 2: OCR Not Working — Empty Language Dropdown

Symptom: The OCR tool appears in the UI but the language dropdown is empty. You can’t select any language to perform OCR.

Cause: Tesseract trained data files (.traineddata) are missing, corrupted, or the tessdata directory contains extra files that interfere with language detection. Stirling-PDF scans /usr/share/tessdata/ for .traineddata files — anything else in that directory can break the dropdown.

Fix:

# Check what's in the tessdata directory
docker exec stirling-pdf ls -la /usr/share/tessdata/

# Verify the English trained data exists
docker exec stirling-pdf file /usr/share/tessdata/eng.traineddata

# Remove non-traineddata files that may interfere
docker exec stirling-pdf find /usr/share/tessdata -type f ! -name "*.traineddata" -delete

# Fix permissions
docker exec stirling-pdf chmod 644 /usr/share/tessdata/*.traineddata

# Restart the container
docker restart stirling-pdf

If eng.traineddata is missing entirely, download it:

# Download English OCR data into your mounted volume
curl -L -o ./data/eng.traineddata \
  https://github.com/tesseract-ocr/tessdata_best/raw/main/eng.traineddata

Then mount the directory in your Docker Compose:

volumes:
  - ./data:/usr/share/tessdata

Adding more languages:

Download the .traineddata file for your language from the Tesseract tessdata repository and place it in the same directory. Common ones:

# German
curl -L -o ./data/deu.traineddata \
  https://github.com/tesseract-ocr/tessdata_best/raw/main/deu.traineddata

# French
curl -L -o ./data/fra.traineddata \
  https://github.com/tesseract-ocr/tessdata_best/raw/main/fra.traineddata

# Spanish
curl -L -o ./data/spa.traineddata \
  https://github.com/tesseract-ocr/tessdata_best/raw/main/spa.traineddata

Restart the container after adding languages.

Fix 3: OutOfMemoryError on Large PDFs

Error message:

java.lang.OutOfMemoryError: Required array size too large
at java.nio.file.Files.readAllBytes(Files.java:3287)
at PipelineProcessor.generateInputFiles()

Affected operations: Merging, splitting, or running pipelines on PDFs larger than ~500 MB.

Cause: Stirling-PDF loads the PDF into a Java byte array for processing. Java arrays have a hard maximum size (~2 GB), and the default JVM heap allocation is typically 1-2 GB. Large PDFs hit this limit regardless of how much system RAM is available.

Fix — Increase JVM heap:

services:
  stirling-pdf:
    image: stirlingtools/stirling-pdf:0.45.3
    container_name: stirling-pdf
    ports:
      - "8080:8080"
    environment:
      JAVA_OPTS: "-Xmx4g -Xms1g"
    volumes:
      - ./data:/usr/share/tessdata
      - ./config:/configs
    restart: unless-stopped

-Xmx4g sets the maximum heap to 4 GB. Adjust based on the largest PDFs you process:

Max PDF SizeRecommended -Xmx
Under 100 MB2g (default)
100-500 MB4g
500 MB - 1 GB6g
Over 1 GB8g+

For files approaching 2 GB: Split the PDF into smaller chunks first using Stirling-PDF’s split function, then process each chunk individually.

Fix 4: PDFs Display with Missing Characters or Boxes

Symptom: Converted PDFs show square boxes (□) instead of text, or characters from non-Latin scripts are missing.

Cause: The Docker image lacks the fonts needed for your documents. The latest image includes basic Latin fonts but may not have CJK (Chinese, Japanese, Korean), Arabic, or other script fonts.

Fix: Switch to the -fat image variant, which bundles additional fonts and OCR languages:

image: stirlingtools/stirling-pdf:2.6.0-fat

Or mount a custom fonts directory:

volumes:
  - /usr/share/fonts:/usr/share/fonts:ro
  - ./data:/usr/share/tessdata

This maps your host’s font collection into the container. Install the needed fonts on your host first:

# Ubuntu/Debian
sudo apt install fonts-noto-cjk fonts-noto-color-emoji fonts-liberation

# Then restart the container
docker restart stirling-pdf

Fix 5: Settings Don’t Persist After Container Restart

Symptom: Custom settings (default language, theme, disabled tools) reset every time the container restarts.

Cause: The configuration directory isn’t mounted as a volume.

Fix:

volumes:
  - ./config:/configs
  - ./data:/usr/share/tessdata
  - ./logs:/logs

The /configs directory stores your settings. Without mounting it, container restarts wipe all configuration.

Fix 6: Authentication Stops Working After Extended Uptime

Symptom: Login works after container start but stops accepting credentials after hours or days of uptime.

Cause: Session or token expiration in Stirling-PDF’s internal auth system.

Fix:

# Quick fix — restart the container
docker restart stirling-pdf

# Check logs for auth-related errors
docker logs stirling-pdf 2>&1 | grep -i "auth\|login\|session"

If this happens repeatedly, verify your auth environment variables are set correctly:

environment:
  SECURITY_ENABLELOGIN: "true"
  SECURITY_INITIALLOGIN_USERNAME: "admin"
  SECURITY_INITIALLOGIN_PASSWORD: "your-secure-password"

Prevention

  • Use latest for most deployments — it includes both LibreOffice and Tesseract
  • Set JAVA_OPTS: "-Xmx4g" from the start if you process PDFs larger than 100 MB
  • Mount /configs, /usr/share/tessdata, and /logs as volumes
  • Pin your Docker image to a specific version tag — don’t use :latest in production
  • Test OCR after deployment by running a sample scan before relying on it

Comments