Multi-GPU Benchmarks Report
This is a report summarizing the benchmark methodology, the environment, the metrics, and the conclusions based on the data we collected.
2. Environment & Setup
2.1 GPU Presets
2.2 Models
2.3 Benchmark Script & Methodology
3. Overview of Parallel Strategies
Llama-3.2-3B-Instruct - 2048 context length @10req concurrency for 100 total requests
GPU Preset
Pipeline (Prompt TPS)
Tensor (Prompt TPS)
Pipeline (Gen TPS)
Tensor (Gen TPS)
Prompt Speedup (%)
Gen Speedup (%)


4. Results & Observations
4.1 2048-Token Context Benchmarks
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.2 8192-Token Context Benchmarks
4.2.1 DeepSeek-R1-Distill-Qwen-1.5B - 8192 context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.2.2 Llama-3.2-3B-Instruct - 8192 context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.2.3 Llama-8B - 8192 context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.2.4 Qwen-32B - 8192 context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.3 64k-Token Context Benchmarks
4.3.1 Qwen-1.5B - 64k context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.3.2 Llama-3.2-3B - 64k context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.3.3 Llama-3.1-8B - 64k context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.3.4 DeepSeek-R1-Distill-Qwen-32B - 64k context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.4 128k-Token Context Benchmarks
4.4.1 Qwen-1.5B - 128k context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.4.2 Llama-3.2-3B - 128k context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.4.3 Llama-3.1-8B - 128k context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

4.4.4 DeepSeek-R1-Distill-Qwen-32B - 128k context length @10req at a time for 100 total requests
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Errors

5. Overview of concurrency strategies
Preset
Prompt TPS
Gen TPS
Avg Latency (s)
Request Generation Level TPS
Request Prompt Level TPS
Concurrency
Errors
10
1


Important notes
Last updated
Was this helpful?