Go benchmark tests that measure end-to-end performance of a running Ollama server. Run these tests to evaluate model inference performance on your hardware and measure the impact of code changes.
Run these benchmarks when:
Ollama server running locally with ollama serve
on 127.0.0.1:11434
[!NOTE] All commands must be run from the root directory of the Ollama project.
Basic syntax:
go test -bench=. ./benchmark/... -m $MODEL_NAME
Required flags:
-bench=.
: Run all benchmarks-m
: Model name to benchmarkOptional flags:
-count N
: Number of times to run the benchmark (useful for statistical analysis)-timeout T
: Maximum time for the benchmark to run (e.g. "10m" for 10 minutes)Common usage patterns:
Single benchmark run with a model specified:
go test -bench=. ./benchmark/... -m llama3.3
The benchmark reports several key metrics:
gen_tok/s
: Generated tokens per secondprompt_tok/s
: Prompt processing tokens per secondttft_ms
: Time to first token in millisecondsload_ms
: Model load time in millisecondsgen_tokens
: Total tokens generatedprompt_tokens
: Total prompt tokens processedEach benchmark runs two scenarios:
Three prompt lengths are tested for each scenario: