How to Self-Host Stable Diffusion WebUI

What Is Stable Diffusion WebUI?

Stable Diffusion WebUI (by AUTOMATIC1111) is the most popular web interface for running Stable Diffusion image generation models locally. It provides txt2img, img2img, inpainting, upscaling, and dozens of other image generation features through a Gradio-based web interface. It’s a self-hosted alternative to Midjourney and DALL-E.

Prerequisites

  • A Linux server (Ubuntu 22.04+ recommended)
  • Docker and Docker Compose installed (guide)
  • NVIDIA GPU with 8+ GB VRAM (4 GB minimum, 12+ GB recommended)
  • 16 GB+ system RAM
  • 20 GB+ free disk space (models are 2-7 GB each)
  • NVIDIA Container Toolkit installed

Docker Compose Configuration

There’s no official Docker image. The recommended approach is a custom Dockerfile or using the source install method. Here’s a Docker Compose setup using a community image:

services:
  stable-diffusion:
    image: universonic/stable-diffusion-webui:latest  # No versioned Docker tags published — :latest is the only option
    container_name: stable-diffusion
    ports:
      - "7861:7861"
    volumes:
      - sd_models:/app/stable-diffusion-webui/models
      - sd_outputs:/app/stable-diffusion-webui/outputs
      - sd_extensions:/app/stable-diffusion-webui/extensions
    environment:
      - CLI_ARGS=--listen --api --xformers
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  sd_models:
  sd_outputs:
  sd_extensions:

Alternative: Source installation (more reliable, recommended):

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
./webui.sh --listen --api --xformers

The startup script handles all dependencies, creates a venv, and installs PyTorch with CUDA support.

Start the stack:

docker compose up -d

Initial Setup

  1. Open http://your-server:7861 in your browser
  2. The first start downloads the default Stable Diffusion v1.5 model (~4 GB)
  3. Type a prompt in the txt2img tab and click Generate

Downloading Better Models

Download models from CivitAI or HuggingFace and place them in the models/Stable-diffusion/ directory:

# Example: Download SDXL base model
wget -P models/Stable-diffusion/ \
  https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

Refresh the model list in the web UI dropdown after adding new models.

Configuration

Key CLI Arguments

ArgumentDescription
--listenListen on 0.0.0.0 (required for Docker/remote access)
--apiEnable the REST API
--xformersEnable xFormers for faster generation and lower VRAM
--medvramOptimize for 8 GB VRAM GPUs
--lowvramOptimize for 4 GB VRAM GPUs (slower)
--shareCreate a public Gradio link
--port PORTCustom port (default: 7861)
--no-halfDisable FP16 (for GPUs without FP16 support)

Model Types

TypeDirectoryPurpose
Checkpointsmodels/Stable-diffusion/Base image generation models
VAEmodels/VAE/Variational autoencoders for color/detail
LoRAmodels/Lora/Fine-tuned adapters for specific styles
Embeddingsembeddings/Textual inversions for concepts/styles
ControlNetmodels/ControlNet/Pose/depth/edge-guided generation

Advanced Configuration

ControlNet

ControlNet allows pose-guided, edge-guided, and depth-guided image generation:

  1. Install the ControlNet extension from the Extensions tab
  2. Download ControlNet models to models/ControlNet/
  3. Enable ControlNet in the generation parameters

API Usage

Generate images via the REST API:

curl -X POST http://localhost:7861/sdapi/v1/txt2img \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a cat in space, digital art",
    "negative_prompt": "low quality, blurry",
    "steps": 20,
    "width": 512,
    "height": 512,
    "cfg_scale": 7
  }'

The response contains base64-encoded images.

SDXL Support

SDXL models require 12+ GB VRAM. Use --xformers and consider --medvram if you’re at the VRAM limit.

Reverse Proxy

Configure your reverse proxy to forward to port 7861. See Reverse Proxy Setup.

Backup

Back up these volumes:

  • sd_models/ — Downloaded models (large, can be re-downloaded)
  • sd_outputs/ — Generated images (irreplaceable)
  • sd_extensions/ — Installed extensions (can be re-downloaded)

Priority: sd_outputs/ is irreplaceable. Models and extensions can be re-downloaded. See Backup Strategy.

Troubleshooting

Out of VRAM

Symptom: RuntimeError: CUDA out of memory Fix: Add --medvram or --lowvram to CLI_ARGS. Reduce resolution (start with 512x512). Use --xformers if not already enabled. Generate one image at a time (batch size 1).

Slow Generation

Symptom: Images take 60+ seconds. Fix: Enable --xformers. Use a smaller model (SD 1.5 instead of SDXL). Reduce steps (20 is usually sufficient). Ensure GPU is being used (check with nvidia-smi).

Extension Installation Fails

Symptom: Extension install from URL fails. Fix: Check that the container has internet access. Install extensions manually by cloning into the extensions/ directory. Restart the container after manual installation.

Black Images Generated

Symptom: Output images are completely black. Fix: This usually indicates an incompatible model format or a safety checker triggering. Try a different model. Add --no-half-vae to CLI_ARGS. Disable the safety checker in settings.

Resource Requirements

  • VRAM: 4 GB minimum (SD 1.5 with --lowvram), 8 GB recommended, 12+ GB for SDXL
  • RAM: 8-16 GB
  • CPU: Low-medium (GPU does the work)
  • Disk: 4-7 GB per model, plus generated images

Verdict

Stable Diffusion WebUI (AUTOMATIC1111) is the most feature-rich image generation interface. The extension ecosystem, model compatibility, and community are unmatched. The trade-off is a more complex setup compared to cloud alternatives, and it requires a decent NVIDIA GPU.

Choose Stable Diffusion WebUI for a feature-rich, traditional image generation workflow. Choose ComfyUI for node-based workflows with more control over the generation pipeline.

Frequently Asked Questions

Do I need an NVIDIA GPU to run Stable Diffusion?

An NVIDIA GPU with 8+ GB VRAM is strongly recommended. AMD GPUs work via ROCm but with limited support. CPU-only generation is possible but extremely slow (minutes per image vs seconds on GPU). A GPU is effectively required for practical use.

How does Stable Diffusion WebUI compare to ComfyUI?

Stable Diffusion WebUI (AUTOMATIC1111) has a traditional UI with tabs and settings — easier for beginners. ComfyUI uses a node-based workflow graph that gives more control over the generation pipeline. Most beginners start with WebUI; power users often migrate to ComfyUI for complex workflows.

What’s the difference between SD 1.5, SDXL, and SD 3?

SD 1.5 generates 512x512 images and needs 4-8 GB VRAM. SDXL generates 1024x1024 images with better quality but needs 12+ GB VRAM. SD 3 and newer models offer improved text rendering and composition but require even more VRAM. Start with SD 1.5 on limited hardware.

Can I use Stable Diffusion without Docker?

Yes. The recommended approach is actually the source installation: clone the GitHub repo and run ./webui.sh. This handles all Python dependencies, venv creation, and PyTorch/CUDA setup automatically. Docker is an alternative, but the source install is more commonly used.

How much disk space do models need?

SD 1.5 checkpoints are ~2-4 GB each. SDXL models are ~6-7 GB. LoRA adapters are 50-300 MB. A typical setup with 3-5 models plus LoRAs uses 15-30 GB. Generated images are small (200 KB-2 MB each) and accumulate over time.

Is Stable Diffusion WebUI still actively maintained?

Development has slowed compared to 2023-2024, but the project still receives updates. The community has largely shifted attention to ComfyUI and newer interfaces like Forge (a WebUI fork optimized for lower VRAM usage). AUTOMATIC1111’s WebUI remains the most feature-complete option with the largest extension ecosystem.

Comments