Use cases
Worked examples. Pick a real setup, see exactly what fits and at what speed.
Run Llama 3.1 8B on a Laptop GPU
The smallest sane local-LLM setup. 8B at Q4, 8k context, on any 8 GB+ GPU.
Run Llama 3.1 70B on Two RTX 3090s
The classic budget 70B setup. 48 GB combined VRAM, tensor parallel, Q4 quant.
Run DeepSeek V3 on a Mac Studio
The only single-machine path to running 671B-parameter DeepSeek V3 quietly at home.
Set Up a 32k-Context Coding Assistant
Long context for repo-scale work without melting the GPU. Right model, right quant, right KV cache.