A100 vs H100 for LLM Serving

Both serve datacenter LLMs. The H100 is faster on every axis, the A100 is the cheaper used market.

A100 80 GB has 2039 GB/s bandwidth and 78 TFLOPs FP16. H100 80 GB SXM has 3350 GB/s and 989 TFLOPs FP16 (with FP8 sparsity tricks pushing this higher in practice).

For LLM token generation, the H100 is roughly 1.6x faster per card thanks to bandwidth. For prefill, the H100 is 5-10x faster thanks to compute and FP8 support. For training, H100 wins by even more.

The MSRP gap is large: ~$15k for A100 80GB, ~$30k for H100 80GB SXM. On used markets the gap narrows. Cloud-instance hourly pricing puts H100 at roughly 1.7x A100.

If you serve a fixed model at known traffic, A100 is often the better cost per token. If you build agent systems with heavy prefill or train, H100 pays for itself.