GPU Performance Calculator
Explore LLM inference benchmarks across GPU types and estimate hardware requirements for your model.
Token Throughput per GPU vs End-to-End Latency
8 GPUs · 100 concurrent requests · 128 output tokens · Points labeled with context length
Total Benchmarks
113
Configurations
24
Max Throughput/GPU
22,036 tok/s
Min Latency
0.00 s