What Is Text Generation WebUI?

Text Generation WebUI (commonly called “Oobabooga”) is a Gradio-based web interface for running large language models locally. It supports the widest range of model formats of any LLM interface — GGUF, GPTQ, AWQ, EXL2, and HuggingFace Transformers. It also supports LoRA training and fine-tuning, making it the go-to tool for ML enthusiasts who want deep model control.

Prerequisites

A Linux server (Ubuntu 22.04+ recommended)
Docker and Docker Compose installed (guide)
NVIDIA GPU with 8+ GB VRAM (recommended)
16 GB+ system RAM
30 GB+ free disk space
NVIDIA Container Toolkit installed (for GPU mode)

Docker Compose Configuration

Text Generation WebUI doesn’t have an official Docker image, but the community-maintained setup works well. Create a docker-compose.yml:

services:
  text-gen-webui:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: text-gen-webui
    ports:
      - "7860:7860"    # Web UI
      - "5000:5000"    # API server
    volumes:
      - ./models:/app/models
      - ./loras:/app/loras
      - ./characters:/app/characters
      - ./presets:/app/presets
      - ./extensions:/app/extensions
    environment:
      - CLI_ARGS=--listen --api --verbose
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped

Create a Dockerfile:

FROM nvidia/cuda:12.4.1-devel-ubuntu22.04

RUN apt-get update && apt-get install -y \
    git python3 python3-pip python3-venv wget \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

RUN git clone https://github.com/oobabooga/text-generation-webui.git . && \
    git checkout v1.10.1

RUN pip3 install --no-cache-dir -r requirements.txt && \
    pip3 install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

EXPOSE 7860 5000

CMD ["python3", "server.py", "--listen", "--api"]

Alternatively, use the official installation method which is simpler:

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
./start_linux.sh

The startup script creates a conda environment and handles all dependencies automatically.

Start the stack:

docker compose up -d --build

Initial Setup

Open http://your-server:7860 in your browser
Go to the Model tab
Enter a HuggingFace model name (e.g., TheBloke/Mistral-7B-Instruct-v0.2-GGUF)
Click Download and wait for the model to download
Select the model and click Load
Switch to the Chat tab and start chatting

Configuration

Model Loader Selection

Loader	Model Formats	Best For
llama.cpp	GGUF	CPU/GPU hybrid inference, quantized models
ExLlamaV2	EXL2, GPTQ	Fastest GPU inference, quantized models
Transformers	SafeTensors, HF format	Full precision, training, fine-tuning
AutoGPTQ	GPTQ	GPU inference, older GPTQ models
AutoAWQ	AWQ	GPU inference, AWQ quantized models

CLI Arguments

Argument	Description
`--listen`	Listen on 0.0.0.0 (required for Docker)
`--api`	Enable the OpenAI-compatible API on port 5000
`--verbose`	Enable detailed logging
`--cpu`	Run on CPU only (slow)
`--n-gpu-layers N`	Number of GPU layers (for llama.cpp)
`--gpu-memory X`	Set GPU VRAM limit in GiB
`--extensions E1 E2`	Load extensions on startup

Advanced Configuration

LoRA Training

Text Generation WebUI includes a built-in LoRA training interface:

Go to the Training tab
Prepare training data in the expected format (JSON or raw text)
Select a base model (must be loaded in Transformers format)
Configure training parameters (learning rate, epochs, batch size)
Start training — the LoRA adapter is saved to loras/

Extensions

Extensions add functionality. Popular ones include:

openai — OpenAI-compatible API server
multimodal — Vision model support
superboogav2 — RAG (retrieval augmented generation)
whisper_stt — Speech-to-text input
silero_tts — Text-to-speech output

API Usage

The OpenAI-compatible API runs on port 5000:

curl http://localhost:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Mistral-7B-Instruct-v0.2",
    "messages": [{"role": "user", "content": "What is self-hosting?"}]
  }'

Reverse Proxy

Configure your reverse proxy to forward to port 7860 (Web UI) or 5000 (API). WebSocket support is required for the Gradio UI. See Reverse Proxy Setup.

Backup

Back up these directories:

models/ — Downloaded models (large, can be re-downloaded)
loras/ — Trained LoRA adapters (cannot be re-created without retraining)
characters/ — Custom character definitions
presets/ — Generation parameter presets

Priority: loras/ and characters/ are irreplaceable. Models can be re-downloaded. See Backup Strategy.

Troubleshooting

CUDA Out of Memory

Symptom: Model fails to load with OOM error. Fix: Use a smaller quantized model. Set --n-gpu-layers to offload fewer layers to GPU. Use EXL2 or GGUF quantization for smaller VRAM footprint.

Model Downloads Slowly

Symptom: Model download from HuggingFace is very slow. Fix: Download models manually using huggingface-cli download and place them in the models/ directory.

Gradio UI Won’t Load

Symptom: Port 7860 connection refused. Fix: Ensure --listen flag is set in CLI_ARGS. Check Docker port mapping. Verify the container started successfully: docker logs text-gen-webui.

Extension Not Working

Symptom: Extension doesn’t appear or crashes. Fix: Install extension dependencies inside the container. Some extensions require additional Python packages not included in the base installation.

Resource Requirements

VRAM: 4-8 GB for 7B Q4, 8-16 GB for 13B Q4, 16-24 GB for 7B FP16
RAM: 8-32 GB (depends on model size and loader)
CPU: Medium-high (benefits from more cores for CPU inference)
Disk: 5-100 GB per model

Verdict

Text Generation WebUI is the power user’s LLM interface. It supports more model formats and loading backends than any other tool, and the built-in LoRA training is unique. The trade-off is more complex setup and a less polished UI compared to Open WebUI.

Choose Text Generation WebUI if you want LoRA training, EXL2 model support, or deep control over inference parameters. Choose Open WebUI + Ollama for a polished ChatGPT-like experience with simpler setup.

What Is Text Generation WebUI?

Prerequisites

Docker Compose Configuration

Initial Setup

Configuration

Model Loader Selection

CLI Arguments

Advanced Configuration

LoRA Training

Extensions

API Usage

Reverse Proxy

Backup

Troubleshooting

CUDA Out of Memory

Model Downloads Slowly

Gradio UI Won’t Load

Extension Not Working

Resource Requirements

Verdict

Related

Related Articles

Open WebUI vs Text Generation WebUI: Compared

Best Self-Hosted AI & ML Tools in 2026

How to Self-Host LocalAI with Docker Compose

How to Self-Host Ollama with Docker Compose

How to Self-Host Open WebUI with Docker

How to Self-Host vLLM with Docker Compose

Get self-hosting tips in your inbox