GPU Performance Calculator

Explore LLM inference benchmarks across GPU types and estimate hardware requirements for your model.

GPU Type

Model

Engine

Parallelism

8 GPUs · 100 concurrent requests · 128 output tokens · Points labeled with context length

Total Benchmarks

113

Configurations

Max Throughput/GPU

22,036 tok/s

Min Latency

0.00 s