Use cases

Worked examples. Pick a real setup, see exactly what fits and at what speed.

Run Llama 3.1 8B on a Laptop GPU

The smallest sane local-LLM setup. 8B at Q4, 8k context, on any 8 GB+ GPU.

Run Llama 3.1 70B on Two RTX 3090s

The classic budget 70B setup. 48 GB combined VRAM, tensor parallel, Q4 quant.

Run DeepSeek V3 on a Mac Studio

The only single-machine path to running 671B-parameter DeepSeek V3 quietly at home.

Set Up a 32k-Context Coding Assistant

Long context for repo-scale work without melting the GPU. Right model, right quant, right KV cache.