How to Self-Host Tabby with Docker Compose
What Is Tabby?
Tabby is a self-hosted AI code completion server. It runs a code-specific language model on your infrastructure and serves completions to IDE extensions (VS Code, JetBrains, Vim/Neovim). Tabby indexes your repositories for context-aware suggestions and includes an admin dashboard for managing users and monitoring usage. Think of it as a self-hosted GitHub Copilot alternative.
Prerequisites
- A Linux server (Ubuntu 22.04+ recommended)
- Docker and Docker Compose installed (guide)
- NVIDIA GPU with 4+ GB VRAM (recommended) or CPU mode (slower)
- 8 GB+ RAM
- 10 GB+ free disk space
- NVIDIA Container Toolkit (for GPU mode)
Docker Compose Configuration
Create a docker-compose.yml file:
services:
tabby:
image: tabbyml/tabby:v0.32.0
container_name: tabby
ports:
- "8080:8080"
volumes:
- tabby_data:/data
command: serve --model StarCoder-1B --device cuda
# For CPU-only mode, use:
# command: serve --model StarCoder-1B --device cpu
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
volumes:
tabby_data:
Start the stack:
docker compose up -d
Tabby downloads the model on first start (StarCoder-1B is ~2 GB).
Initial Setup
- Open
http://your-server:8080in your browser - Create your admin account on first visit
- Go to Settings → Repository to add your code repositories
- Install the Tabby extension in your IDE
IDE Extension Setup
VS Code:
- Install “Tabby” from the VS Code Marketplace
- Open Settings → search for “Tabby”
- Set Server Endpoint to
http://your-server:8080
JetBrains:
- Install “Tabby” from the JetBrains Plugin Marketplace
- Go to Settings → Tools → Tabby
- Set Server Endpoint to
http://your-server:8080
Vim/Neovim:
Install TabbyML/vim-tabby plugin and configure the endpoint.
Configuration
Model Selection
| Model | VRAM Required | Quality | Speed |
|---|---|---|---|
StarCoder-1B | ~2 GB | Good | Very fast |
StarCoder-3B | ~4 GB | Better | Fast |
StarCoder-7B | ~8 GB | Best built-in | Moderate |
CodeLlama-7B | ~8 GB | Best overall | Moderate |
DeepSeek-Coder-1.3B | ~2 GB | Good | Very fast |
Change the model in the command field:
command: serve --model CodeLlama-7B --device cuda
Repository Indexing
Add repositories through the admin dashboard for context-aware completions:
- Go to Settings → Repository
- Add Git repository URLs
- Tabby indexes the code and uses it as context for completions
Key CLI Arguments
| Argument | Description |
|---|---|
--model MODEL | Code completion model to serve |
--chat-model MODEL | Separate model for chat (optional) |
--device cuda/cpu | Inference device |
--port PORT | Server port (default: 8080) |
--parallelism N | Max concurrent completion requests |
Advanced Configuration
Separate Chat Model
Use a larger model for chat while keeping a fast small model for completions:
command: >
serve
--model StarCoder-1B
--chat-model CodeLlama-7B-Instruct
--device cuda
Using an External LLM Backend
Tabby can connect to Ollama or other OpenAI-compatible APIs instead of serving models itself:
command: serve --device cpu
environment:
- TABBY_LLM_ENDPOINT=http://ollama:11434
Reverse Proxy
Configure your reverse proxy to forward to port 8080. See Reverse Proxy Setup.
Backup
Back up the Tabby data volume:
docker run --rm -v tabby_data:/data -v $(pwd):/backup alpine \
tar czf /backup/tabby-backup.tar.gz /data
This contains user accounts, repository indexes, and configuration. Models can be re-downloaded. See Backup Strategy.
Troubleshooting
Completions Are Slow
Symptom: Code completions take 2+ seconds.
Fix: Use a smaller model (StarCoder-1B or DeepSeek-Coder-1.3B). Ensure GPU mode is active (--device cuda). Check that the NVIDIA Container Toolkit is installed.
GPU Not Detected
Symptom: Container falls back to CPU mode.
Fix: Verify nvidia-smi works on the host. Ensure deploy.resources.reservations.devices is set in docker-compose.yml. Install NVIDIA Container Toolkit if missing.
IDE Extension Not Connecting
Symptom: Extension shows “disconnected” status. Fix: Verify the server URL is correct and accessible. Check firewall rules for port 8080. If using a reverse proxy, ensure it’s forwarding correctly.
No Context-Aware Completions
Symptom: Completions don’t reference your codebase. Fix: Add your repositories in Settings → Repository. Wait for indexing to complete (check the admin dashboard for status). Ensure the repository URL is accessible from the Tabby container.
Resource Requirements
- VRAM: 2 GB (1B models), 4 GB (3B models), 8 GB (7B models)
- RAM: 4-8 GB system RAM
- CPU: Low-medium (GPU does the heavy lifting)
- Disk: 2-10 GB per model + repository index size
Verdict
Tabby is the best self-hosted code completion server for teams. The admin dashboard, user management, repository indexing, and usage analytics make it a proper enterprise-ready tool. The trade-off is that it requires a dedicated GPU for reasonable performance, and the model selection is more limited than using a general-purpose LLM backend like Ollama.
Choose Tabby for a centralized code AI server for your team. Choose Continue.dev + Ollama if you want more flexibility and don’t need centralized management.
Frequently Asked Questions
Does Tabby require a GPU?
A GPU is strongly recommended but not required. Tabby supports CPU-only mode (--device cpu), but inference is significantly slower — completions may take 2-5 seconds instead of milliseconds. For a usable experience, an NVIDIA GPU with 4+ GB VRAM is recommended. The smaller models (StarCoder-1B, DeepSeek-Coder-1.3B) work on 2 GB VRAM.
How does Tabby compare to GitHub Copilot?
GitHub Copilot uses cloud-hosted models and costs $10-19/month per user. Tabby runs entirely on your infrastructure — your code never leaves your servers. Copilot generally produces higher-quality completions due to larger models, but Tabby with repository indexing provides context-aware suggestions tailored to your codebase. For privacy-sensitive codebases, Tabby is the clear choice.
Can I use Tabby with Ollama?
Yes. Tabby can connect to Ollama or any OpenAI-compatible API backend instead of serving models itself. Set TABBY_LLM_ENDPOINT=http://ollama:11434 as an environment variable. This lets you use any model Ollama supports while keeping Tabby’s IDE integration and admin dashboard.
Does Tabby support multiple users?
Yes. The admin dashboard includes user management with API token generation. Each user gets their own token for IDE authentication. The admin can monitor per-user usage statistics, see completion acceptance rates, and manage repository access.
What programming languages does Tabby support?
Tabby supports all major programming languages — the underlying models (StarCoder, CodeLlama, DeepSeek-Coder) are trained on datasets covering Python, JavaScript, TypeScript, Java, Go, Rust, C/C++, and dozens more. Quality varies by language — popular languages get better completions than niche ones.
Can I fine-tune models with my own code?
Not directly through Tabby’s interface. However, Tabby supports repository indexing which provides context-aware completions using your codebase without fine-tuning. For actual model fine-tuning, you’d need to fine-tune the model separately and then serve it through Tabby.
Related
Get self-hosting tips in your inbox
Get the Docker Compose configs, hardware picks, and setup shortcuts we don't put in articles. Weekly. No spam.
Comments