GPU Passthrough for Home Servers

Quick Recommendation

For Plex/Jellyfin hardware transcoding: Get an Intel N100 mini PC with Quick Sync — it handles 10+ simultaneous transcodes at 8W. No discrete GPU needed.

For AI inference (Ollama, LocalAI): NVIDIA GeForce RTX 3060 12GB (~$200 used) is the best value — 12GB VRAM runs 7B-13B parameter models. Pair with Proxmox GPU passthrough.

For gaming VM + server on one box: NVIDIA RTX 4060 or AMD RX 7600 passed through to a Windows VM on Proxmox, while Linux containers run on the host.

Use Cases for GPUs in Home Servers

Use CaseGPU Required?Best Option
Plex/Jellyfin transcodingNo — Intel Quick Sync is betterIntel iGPU (N100, 12th gen+)
AI/LLM inference (Ollama)Yes — VRAM matters mostNVIDIA RTX 3060 12GB
Stable Diffusion / ComfyUIYes — VRAM + computeNVIDIA RTX 3090 24GB
Gaming VM (single-player)Yes — passthrough to VMNVIDIA RTX 4060 / AMD RX 7600
Security camera AI (Frigate)No — Intel iGPU or Coral TPUGoogle Coral USB ($30)
Video encoding (FFmpeg)Optional — GPU acceleration helpsNVIDIA NVENC (any GTX/RTX)

Why Intel Quick Sync Beats Discrete GPUs for Transcoding

Intel’s integrated GPU has dedicated hardware video encode/decode blocks that are more efficient than discrete GPUs for transcoding:

MethodPower DrawSimultaneous 4K TranscodesCost
Intel N100 Quick Sync6-10W10-15$150 (whole PC)
Intel 12th gen Quick Sync15-25W20+$200-300 (whole PC)
NVIDIA RTX 3060 NVENC30-80W5-8 (software limit)$200 (GPU alone)
CPU transcoding (8 cores)80-150W2-3Varies

NVIDIA also software-limits concurrent NVENC sessions to 5 on consumer GPUs (bypassed with a driver patch, but janky). Intel has no such limit.

GPU Recommendations by Use Case

AI/LLM Inference

VRAM is the bottleneck. A 7B parameter model (Llama 3 7B, Mistral 7B) needs ~4-5GB VRAM. A 13B model needs ~8-10GB. A 70B model needs ~40GB.

GPUVRAMPerformance (tokens/sec on 7B)PowerUsed Price
RTX 3060 12GB12 GB~35 t/s170W TDP~$200
RTX 3090 24GB24 GB~55 t/s350W TDP~$600
RTX 4060 8GB8 GB~40 t/s115W TDP~$280
RTX 4090 24GB24 GB~90 t/s450W TDP~$1,600
Tesla P40 24GB24 GB~25 t/s250W TDP~$150

Best value: RTX 3060 12GB. The 12GB VRAM is unusual for its tier — most xx60 cards have 6-8GB. This runs 7B-13B models comfortably. Used prices around $200 make it the clear winner for home AI.

Budget pick: Tesla P40. 24GB VRAM for ~$150 used. Slower compute than consumer cards, no video output (data center card), needs active cooling (add a $20 fan shroud). But 24GB VRAM for $150 is unbeatable for running larger models.

Gaming VM

GPUPerformance TierPowerPrice
RTX 40601080p/1440p high115W~$280
RX 76001080p high165W~$230
RTX 40701440p/4K medium200W~$480
RX 7800 XT1440p/4K medium263W~$420

AMD cards work well for passthrough but require more configuration (vendor-id reset workaround). NVIDIA is more straightforward for passthrough but requires the vfio driver trick on consumer GPUs.

Video Processing (FFmpeg/Handbrake)

Any NVIDIA GTX/RTX card with NVENC handles hardware-accelerated video encoding. The RTX 3060 is again the sweet spot — NVENC quality on 30-series is excellent, and the card is cheap used.

GPU Passthrough with Proxmox

GPU passthrough (VFIO/IOMMU) gives a VM direct access to the physical GPU — near-native performance.

Requirements

  1. CPU with IOMMU support: Intel VT-d or AMD-Vi
  2. Motherboard with IOMMU groups: Check your IOMMU groups — the GPU must be in its own group or with devices you can also pass through
  3. Two GPUs (for gaming VM): One for Proxmox console (iGPU or second dGPU), one for passthrough. For headless AI, the host doesn’t need a display.

Setup Steps (Proxmox)

1. Enable IOMMU in BIOS:

  • Intel: Enable VT-d
  • AMD: Enable IOMMU/AMD-Vi

2. Enable IOMMU in Proxmox boot parameters:

# Edit GRUB config
nano /etc/default/grub

# Intel:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
# AMD:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

update-grub
reboot

3. Load VFIO modules:

# Add to /etc/modules
echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules

4. Blacklist GPU drivers on the host:

# /etc/modprobe.d/blacklist.conf
echo -e "blacklist nouveau\nblacklist nvidia\nblacklist radeon\nblacklist amdgpu" >> /etc/modprobe.d/blacklist.conf

5. Bind GPU to VFIO:

# Find GPU PCI IDs
lspci -nn | grep -i nvidia
# Example output: 01:00.0 VGA: NVIDIA [10de:2504]
#                 01:00.1 Audio: NVIDIA [10de:228e]

# /etc/modprobe.d/vfio.conf
echo "options vfio-pci ids=10de:2504,10de:228e disable_vga=1" >> /etc/modprobe.d/vfio.conf

update-initramfs -u
reboot

6. Add GPU to VM in Proxmox:

  • VM → Hardware → Add → PCI Device
  • Select your GPU
  • Check “All Functions” and “Primary GPU” (for gaming VMs)
  • Set machine type to q35, BIOS to OVMF (UEFI)

NVIDIA-Specific Passthrough Notes

NVIDIA consumer drivers detect virtual machines and refuse to load (Error 43). Fix:

# In the VM's .conf file (/etc/pve/qemu-server/<vmid>.conf)
# Add these CPU flags:
cpu: host,hidden=1,flags=+pcid
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'

This hides the hypervisor from the NVIDIA driver.

LXC Container GPU Access (No Passthrough)

For Docker workloads like Jellyfin transcoding or Ollama, you don’t need full passthrough. Mount the GPU device into an LXC container:

# In the LXC container config (/etc/pve/lxc/<id>.conf)
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir

This shares the GPU between the host and containers — multiple containers can use it simultaneously (unlike passthrough, which dedicates the GPU to one VM).

Power Consumption

GPUIdleTypical LoadMax (TDP)Annual Idle Cost
Intel N100 iGPU0W (part of CPU)3-5W10W$0
RTX 3060 12GB15W80-120W170W$16
RTX 3090 24GB25W200-300W350W$26
RTX 4060 8GB10W70-100W115W$11
Tesla P4030W150-200W250W$32
No GPU (CPU transcode)0W60-120W (CPU)VariesVaries

GPUs draw significant idle power even when not processing. Consider powering down the GPU when not in use (supported on some Linux setups with nvidia-smi or echo auto > /sys/bus/pci/devices/0000:01:00.0/power/control).

Which GPU for Which Server?

Server TypeGPU RecommendationWhy
Intel N100 Mini PCNone (use iGPU)No PCIe slot, iGPU is enough
Dell OptiPlex Micro/SFFLow-profile RTX A2000SFF only fits low-profile cards
Dell OptiPlex MTRTX 3060Full-height PCIe x16
DIY NAS BuildUsually noneNAS workloads don’t need GPU
Enterprise Server (R730)RTX 3090 or Tesla P40Full-size PCIe, adequate PSU
Proxmox clusterRTX 3060 per nodeGPU passthrough to VMs

FAQ

Can I use one GPU for multiple VMs?

Not with standard passthrough — a GPU is dedicated to one VM at a time. SR-IOV (Single Root I/O Virtualization) splits a physical GPU into virtual GPUs, but consumer GPUs don’t support it. NVIDIA A-series and Intel Data Center GPUs support SR-IOV. For home use, the workaround is sharing via LXC containers instead of VMs.

Do I need a GPU for Plex hardware transcoding?

No. Intel Quick Sync (iGPU) is the best option for Plex/Jellyfin transcoding. It’s faster, more power-efficient, and has no concurrent session limit. Even a $150 N100 mini PC handles 10+ simultaneous 4K-to-1080p transcodes.

Can I mine crypto with a GPU in my server?

You can, but it’s not profitable for most GPUs in 2026 given electricity costs. A RTX 3060 mining Ethereum Classic earns roughly $0.30/day while consuming $0.40/day in electricity (@$0.12/kWh). Not recommended.

What about AMD GPUs for AI?

AMD ROCm support has improved significantly but is still behind NVIDIA CUDA in the AI/ML ecosystem. Ollama supports ROCm for AMD RX 6000/7000 series. If you’re running Ollama specifically, AMD works. For broader AI workloads (ComfyUI, training), NVIDIA is the safer bet.

Can I use a GPU from a laptop (MXM/soldered)?

No. Laptop GPUs are not removable or usable in desktops. If you need a low-power GPU, look at the NVIDIA T400/T600 (30-40W TDP, low-profile, passive cooling options).