Branche: main

api

bmizerany/client-registry

bmizerany/client2resume

bmizerany/embedspeedup

bmizerany/fastverify

bmizerany/filepathnobuild

bmizerany/filepathwithcoloninhost

bmizerany/grammar

bmizerany/hrm

bmizerany/modenameenforcealphanum

bmizerany/nameswork

bmizerany/noseek

bmizerany/nosillyggufslurps

bmizerany/replacecolon

bmizerany/types/model/defaultfix

bmizerany/validatenames

bmizerany/x

bruce/iq-quants

brucemacd/allow-ollama

brucemacd/browser-key-register

brucemacd/check-key-register

brucemacd/check-key-register-structured-err

brucemacd/convert-cli

brucemacd/ctx-shift-err

brucemacd/doc-go-engine

brucemacd/done-reason

brucemacd/err-hint

brucemacd/err-no-vocab

brucemacd/forward-test

brucemacd/go_qwen2

brucemacd/install-path-clean

brucemacd/jomorganca/mistral

brucemacd/llama-mem-calc

brucemacd/logprobs

brucemacd/mistral

brucemacd/mistral-small-convert

brucemacd/new_runner_e2e

brucemacd/new_runner_graph_bench

brucemacd/new_runner_qwen2

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/parallel-embed-models

brucemacd/push-name-validation

brucemacd/qwen2_5

brucemacd/rope-config

brucemacd/runner-completion

brucemacd/shim-grammar

brucemacd/tokenize

build_dist

cgo

cp-model

cuda-search

delete-fix

deletemodels

dhiltgen/remove_submodule

distribution

editor

fix-model-names

fix-unknown-model

format-config

go-opts

insecure-registry

jessegross/sample

jessegross/semaphore

jmorgan/sample-fix-sorting-extras

jmorganca/add-missing-symlink-eval

jmorganca/batch-embeddings

jmorganca/degin-1

jmorganca/done-reason

jmorganca/enable-fa

jmorganca/execstack

jmorganca/faster-releases

jmorganca/fix-gguf-error

jmorganca/fix-null-format

jmorganca/fix-proxy

jmorganca/ga

jmorganca/ggml-static

jmorganca/if-none-match

jmorganca/initcmake

jmorganca/limit

jmorganca/llama-bump

jmorganca/llama-cpp-7c26775

jmorganca/llama-cpp-8960fe8

jmorganca/llama-vit

jmorganca/mistral

jmorganca/mistral-wip

jmorganca/mistral3

jmorganca/mllama

jmorganca/mm

jmorganca/native

jmorganca/no-concat

jmorganca/no-error-template

jmorganca/openai-context

jmorganca/openai-fix-first-message

jmorganca/options

jmorganca/qwen2vl

jmorganca/replace-assets

jmorganca/temp-0-images

jmorganca/template-mistral

jmorganca/testing

jmorganca/vendor-081b29bd

jyan/auth

jyan/convert-prog

jyan/format

jyan/local

jyan/local2

jyan/ollama-v

jyan/p2

jyan/paligemma

jyan/palitest

jyan/parse-temp

jyan/progress

jyan/q4_4/8

jyan/quant3

jyan/quant4

jyan/quant5

jyan/reord-g

jyan/v0.146

language_support

license-layers

list-models

main

matt/examplemodelfiles

matt/streamingapi

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/communitylinks

mattw/faq-context

mattw/howtoquant

mattw/noprune

mattw/python-functioncalling

mattw/quantcontext

mattw/selfqueryingretrieval

mattw/whatneedstorun

modelfile-readme

modelpath

modenameenforcealphanum

mxyng/api-models

mxyng/cmd-history

mxyng/create-context

mxyng/environ-2

mxyng/extra-args

mxyng/fix-memory

mxyng/fs-config

mxyng/func-checks

mxyng/gin-slog

mxyng/install

mxyng/layers-from-files

mxyng/mllama

mxyng/modelname-6

mxyng/modelname-7

mxyng/next-bert

mxyng/next-debug

mxyng/next-mlx

mxyng/no-deprecated-gpu-targets

mxyng/server-timestamp

mxyng/split-bin

mxyng/tune-concurrency

mxyng/update-registry-domain

native

nogogen

ollama.com

paligemma-support

parth/cmd-cleanup-SO

parth/constrained-sampling-json

parth/disallow-streaming-tools

parth/fix-default-to-warn-json

parth/fix-referencing-so

parth/log-probs

parth/openai-stream-usage

parth/sample-correctness-fix

parth/sample-fix-sorting

parth/sample-unmarshal-json-for-params

parth/sampling-structured-outputs

parth/set-context-size-openai

parth/templating

parth/tokenize-detokenize

pdevine/bfloat16

pdevine/convert-cohere2

pdevine/fix-template

pdevine/geems-2b

pdevine/gemma2

pdevine/ggla

pdevine/import-docs

pdevine/logging

pdevine/newlines

pdevine/ps-glitches

pdevine/showggmlinfo

progress-flicker

progressbar

pulse

readme-updates

remove-first

rename

revert-5963-revert-5924-mxyng/llama3.1-rope

rmdisplaylong

roy-embed-parallel

royh-embed-parallel

royh-imgembed

royh-ls

royh-name

royh-openai-delete

royh-openai-suffixdocs

royh-params

royh-precision

royh-show-rigid

royh-testdelete

royh/embed-viz

royh/ep-methods

royh/stream-tools

royh/whisper

scratch

shell

skip-list

stream-tools-stop

timeout

update-nous-hermes

upgrade-all

upload-progress

whitespace-detection

Benchmark

Go benchmark tests that measure end-to-end performance of a running Ollama server. Run these tests to evaluate model inference performance on your hardware and measure the impact of code changes.

When to use

Run these benchmarks when:

Making changes to the model inference engine
Modifying model loading/unloading logic
Changing prompt processing or token generation code
Implementing a new model architecture
Testing performance across different hardware setups

Prerequisites

Ollama server running locally with ollama serve on 127.0.0.1:11434

Usage and Examples

[!NOTE] All commands must be run from the root directory of the Ollama project.

Basic syntax:

go test -bench=. ./benchmark/... -m $MODEL_NAME

Required flags:

-bench=.: Run all benchmarks
-m: Model name to benchmark

Optional flags:

-count N: Number of times to run the benchmark (useful for statistical analysis)
-timeout T: Maximum time for the benchmark to run (e.g. "10m" for 10 minutes)

Common usage patterns:

Single benchmark run with a model specified:

go test -bench=. ./benchmark/... -m llama3.3

Output metrics

The benchmark reports several key metrics:

gen_tok/s: Generated tokens per second
prompt_tok/s: Prompt processing tokens per second
ttft_ms: Time to first token in milliseconds
load_ms: Model load time in milliseconds
gen_tokens: Total tokens generated
prompt_tokens: Total prompt tokens processed

Each benchmark runs two scenarios:

Cold start: Model is loaded from disk for each test
Warm start: Model is pre-loaded in memory

Three prompt lengths are tested for each scenario:

Short prompt (100 tokens)
Medium prompt (500 tokens)
Long prompt (1000 tokens)

benchmark.md 1.7 KB Lien permanent Historique Raw

Benchmark

When to use

Prerequisites

Usage and Examples

Output metrics

benchmark.md 1.7 KB

Lien permanent Historique Raw