Stirling-PDF: Conversion Errors — Fix
The Problem
Stirling-PDF handles dozens of PDF operations — conversions, OCR, merging, splitting — and each one can fail in distinct ways. The most common failures fall into four categories: LibreOffice connection errors (Office format conversions), OCR language detection failures, out-of-memory crashes on large PDFs, and missing features from using the wrong Docker image variant.
Stirling-PDF Image Variants
Before troubleshooting, verify you’re running the right image. Many issues trace back to using an image that doesn’t include the tool you need.
| Image Tag | LibreOffice | Tesseract OCR | Extra Fonts | Size | Use Case |
|---|---|---|---|---|---|
latest | Yes | Yes | Basic | ~1.5 GB | Recommended for most users |
latest-fat | Yes | Yes | Comprehensive | ~2 GB | Multilingual documents, CJK fonts |
latest-ultra-lite | No | Reduced | Basic | ~400 MB | Merge/split only, resource-constrained |
If you need Office conversions (PDF to Word, Excel, PowerPoint), you must use latest or latest-fat. The ultra-lite image does not include LibreOffice.
Fix 1: LibreOffice Connection Refused
Error message:
Connector : couldn't connect to socket (Connection refused) at
/home/buildozer/aports/community/libreoffice/src/libreoffice-7.6.7.2/
io/source/connector/connector.cxx:118
Failed to start servers
Exit code: 2
Affected operations: PDF to Word, PDF to PowerPoint, PDF to Excel — any conversion that requires LibreOffice.
Cause: The internal unoserver process (which wraps LibreOffice) fails to start its socket listener. This commonly happens in restricted Docker environments, with certain container runtimes, or after upgrading past v0.41.0.
Fix 1a — Increase container resources:
LibreOffice needs memory to start. If your container has a memory limit, raise it:
services:
stirling-pdf:
image: stirlingtools/stirling-pdf:0.45.3
container_name: stirling-pdf
ports:
- "8080:8080"
volumes:
- ./data:/usr/share/tessdata
- ./config:/configs
deploy:
resources:
limits:
memory: 4G
restart: unless-stopped
Fix 1b — Check the image variant:
docker exec stirling-pdf which libreoffice
If it returns nothing, you’re running ultra-lite. Switch to latest:
docker pull stirlingtools/stirling-pdf:0.45.3
Fix 1c — Downgrade if the issue started after an upgrade:
If conversions worked on a previous version and broke after updating, try the last known-good version:
image: stirlingtools/stirling-pdf:0.41.0
Version 0.42.0 introduced unoserver changes that cause connection failures in some environments. Check the GitHub releases for fix notes before upgrading again.
Fix 2: OCR Not Working — Empty Language Dropdown
Symptom: The OCR tool appears in the UI but the language dropdown is empty. You can’t select any language to perform OCR.
Cause: Tesseract trained data files (.traineddata) are missing, corrupted, or the tessdata directory contains extra files that interfere with language detection. Stirling-PDF scans /usr/share/tessdata/ for .traineddata files — anything else in that directory can break the dropdown.
Fix:
# Check what's in the tessdata directory
docker exec stirling-pdf ls -la /usr/share/tessdata/
# Verify the English trained data exists
docker exec stirling-pdf file /usr/share/tessdata/eng.traineddata
# Remove non-traineddata files that may interfere
docker exec stirling-pdf find /usr/share/tessdata -type f ! -name "*.traineddata" -delete
# Fix permissions
docker exec stirling-pdf chmod 644 /usr/share/tessdata/*.traineddata
# Restart the container
docker restart stirling-pdf
If eng.traineddata is missing entirely, download it:
# Download English OCR data into your mounted volume
curl -L -o ./data/eng.traineddata \
https://github.com/tesseract-ocr/tessdata_best/raw/main/eng.traineddata
Then mount the directory in your Docker Compose:
volumes:
- ./data:/usr/share/tessdata
Adding more languages:
Download the .traineddata file for your language from the Tesseract tessdata repository and place it in the same directory. Common ones:
# German
curl -L -o ./data/deu.traineddata \
https://github.com/tesseract-ocr/tessdata_best/raw/main/deu.traineddata
# French
curl -L -o ./data/fra.traineddata \
https://github.com/tesseract-ocr/tessdata_best/raw/main/fra.traineddata
# Spanish
curl -L -o ./data/spa.traineddata \
https://github.com/tesseract-ocr/tessdata_best/raw/main/spa.traineddata
Restart the container after adding languages.
Fix 3: OutOfMemoryError on Large PDFs
Error message:
java.lang.OutOfMemoryError: Required array size too large
at java.nio.file.Files.readAllBytes(Files.java:3287)
at PipelineProcessor.generateInputFiles()
Affected operations: Merging, splitting, or running pipelines on PDFs larger than ~500 MB.
Cause: Stirling-PDF loads the PDF into a Java byte array for processing. Java arrays have a hard maximum size (~2 GB), and the default JVM heap allocation is typically 1-2 GB. Large PDFs hit this limit regardless of how much system RAM is available.
Fix — Increase JVM heap:
services:
stirling-pdf:
image: stirlingtools/stirling-pdf:0.45.3
container_name: stirling-pdf
ports:
- "8080:8080"
environment:
JAVA_OPTS: "-Xmx4g -Xms1g"
volumes:
- ./data:/usr/share/tessdata
- ./config:/configs
restart: unless-stopped
-Xmx4g sets the maximum heap to 4 GB. Adjust based on the largest PDFs you process:
| Max PDF Size | Recommended -Xmx |
|---|---|
| Under 100 MB | 2g (default) |
| 100-500 MB | 4g |
| 500 MB - 1 GB | 6g |
| Over 1 GB | 8g+ |
For files approaching 2 GB: Split the PDF into smaller chunks first using Stirling-PDF’s split function, then process each chunk individually.
Fix 4: PDFs Display with Missing Characters or Boxes
Symptom: Converted PDFs show square boxes (□) instead of text, or characters from non-Latin scripts are missing.
Cause: The Docker image lacks the fonts needed for your documents. The latest image includes basic Latin fonts but may not have CJK (Chinese, Japanese, Korean), Arabic, or other script fonts.
Fix: Switch to the -fat image variant, which bundles additional fonts and OCR languages:
image: stirlingtools/stirling-pdf:2.6.0-fat
Or mount a custom fonts directory:
volumes:
- /usr/share/fonts:/usr/share/fonts:ro
- ./data:/usr/share/tessdata
This maps your host’s font collection into the container. Install the needed fonts on your host first:
# Ubuntu/Debian
sudo apt install fonts-noto-cjk fonts-noto-color-emoji fonts-liberation
# Then restart the container
docker restart stirling-pdf
Fix 5: Settings Don’t Persist After Container Restart
Symptom: Custom settings (default language, theme, disabled tools) reset every time the container restarts.
Cause: The configuration directory isn’t mounted as a volume.
Fix:
volumes:
- ./config:/configs
- ./data:/usr/share/tessdata
- ./logs:/logs
The /configs directory stores your settings. Without mounting it, container restarts wipe all configuration.
Fix 6: Authentication Stops Working After Extended Uptime
Symptom: Login works after container start but stops accepting credentials after hours or days of uptime.
Cause: Session or token expiration in Stirling-PDF’s internal auth system.
Fix:
# Quick fix — restart the container
docker restart stirling-pdf
# Check logs for auth-related errors
docker logs stirling-pdf 2>&1 | grep -i "auth\|login\|session"
If this happens repeatedly, verify your auth environment variables are set correctly:
environment:
SECURITY_ENABLELOGIN: "true"
SECURITY_INITIALLOGIN_USERNAME: "admin"
SECURITY_INITIALLOGIN_PASSWORD: "your-secure-password"
Prevention
- Use
latestfor most deployments — it includes both LibreOffice and Tesseract - Set
JAVA_OPTS: "-Xmx4g"from the start if you process PDFs larger than 100 MB - Mount
/configs,/usr/share/tessdata, and/logsas volumes - Pin your Docker image to a specific version tag — don’t use
:latestin production - Test OCR after deployment by running a sample scan before relying on it
Related
Get self-hosting tips in your inbox
Get the Docker Compose configs, hardware picks, and setup shortcuts we don't put in articles. Weekly. No spam.
Comments