GPU Performance Calculator

Explore LLM inference benchmarks across GPU types and estimate hardware requirements for your model.

Token Throughput per GPU vs End-to-End Latency

8 GPUs · 100 concurrent requests · 128 output tokens · Points labeled with context length

Total Benchmarks
113
Configurations
24
Max Throughput/GPU
22,036 tok/s
Min Latency
0.00 s