Jeffrey Morgan 1deafd8254 llama: update vendored code to commit 46e3556 (#8308)		3 mesi fa
..
ggml-cuda	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
grammar	290cf2040a llama: test key order preservation in schema_to_grammar (#8078)	4 mesi fa
llamafile	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
patches	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
runner	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
.gitignore	96efd9052f Re-introduce the `llama` package (#5034)	6 mesi fa
README.md	39e29ae5dd llama: fix typo and formatting in readme (#7876)	5 mesi fa
amx.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
amx.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
base64.hpp	96efd9052f Re-introduce the `llama` package (#5034)	6 mesi fa
build-info.cpp	7a81daf026 llama: update vendor code to commit ba1cb19c (#8101)	4 mesi fa
clip.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
clip.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
common.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
common.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-alloc.c	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-alloc.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-backend-impl.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-backend-reg.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-backend.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-backend.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-blas.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-blas.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-common.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpp.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu-aarch64.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu-aarch64.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu-impl.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu-quants.c	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu-quants.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu-traits.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu-traits.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu.c	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cpu.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-cuda.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-impl.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-metal-embed.metal	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-metal-embed_darwin_arm64.s	527cc97899 llama: update vendored code to commit 40c6d79f (#7875)	4 mesi fa
ggml-metal-impl.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-metal.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-metal.metal	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-metal_darwin_arm64.m	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-quants.c	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-quants.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-threading.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml-threading.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml.c	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
ggml.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
json-schema-to-grammar.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
json-schema-to-grammar.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
json.hpp	96efd9052f Re-introduce the `llama` package (#5034)	6 mesi fa
llama-adapter.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-adapter.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-arch.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-arch.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-batch.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-batch.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-chat.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-chat.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-context.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-context.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-cparams.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-cparams.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-cpp.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-grammar.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-grammar.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-hparams.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-hparams.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-impl.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-impl.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-kv-cache.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-kv-cache.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-mmap.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-mmap.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-model-loader.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-model-loader.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-model.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-model.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-quant.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-quant.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-sampling.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-sampling.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-vocab.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama-vocab.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama.go	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llama_test.go	9039c821a2 llama: preserve field order in user-defined JSON schemas (#8002)	4 mesi fa
llava.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
llava.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
log.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
log.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
mllama.cpp	527cc97899 llama: update vendored code to commit 40c6d79f (#7875)	4 mesi fa
mllama.h	c7cb0f0602 image processing for llama3.2 (#6963)	6 mesi fa
mmq.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
mmq.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
sampling.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
sampling.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
sampling_ext.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
sampling_ext.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
sgemm.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
sgemm.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
stb_image.h	96efd9052f Re-introduce the `llama` package (#5034)	6 mesi fa
unicode-data.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
unicode-data.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
unicode.cpp	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
unicode.h	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa
vendoring	1deafd8254 llama: update vendored code to commit 46e3556 (#8308)	3 mesi fa

`llama`

This package integrates the llama.cpp library as a Go package and makes it easy to build it with tags for different CPU and GPU processors.

Supported:

CPU
avx, avx2
macOS Metal
Windows CUDA
Windows ROCm
Linux CUDA
Linux ROCm
Llava

Extra build steps are required for CUDA and ROCm on Windows since nvcc and hipcc both require using msvc as the host compiler. For these shared libraries are created:

ggml_cuda.dll on Windows or ggml_cuda.so on Linux
ggml_hipblas.dll on Windows or ggml_hipblas.so on Linux

Note: it's important that memory is allocated and freed by the same compiler (e.g. entirely by code compiled with msvc or mingw). Issues from this should be rare, but there are some places where pointers are returned by the CUDA or HIP runtimes and freed elsewhere, causing a a crash. In a future change the same runtime should be used in both cases to avoid crashes.

Building

go build .

AVX

go build -tags avx .

AVX2

# go doesn't recognize `-mfma` as a valid compiler flag
# see https://github.com/golang/go/issues/17895
go env -w "CGO_CFLAGS_ALLOW=-mfma|-mf16c"
go env -w "CGO_CXXFLAGS_ALLOW=-mfma|-mf16c"
go build -tags=avx,avx2 .

Linux

CUDA

Install the CUDA toolkit v11.3.1:

make ggml_cuda.so
go build -tags avx,cuda .

ROCm

Install ROCm.

make ggml_hipblas.so
go build -tags avx,rocm .

Windows

Download w64devkit for a simple MinGW development environment.

CUDA

Install the CUDA toolkit v11.3.1 then build the cuda code:

make ggml_cuda.dll
go build -tags avx,cuda .

ROCm

Install ROCm.

make ggml_hipblas.dll
go build -tags avx,rocm .

Building runners

# build all runners for this platform
make -j

Vendoring

Ollama currently vendors llama.cpp and ggml through a vendoring model. While we generally strive to contribute changes back upstream to avoid drift, we cary a small set of patches which are applied to the tracking commit. A set of make targets are available to aid developers in updating to a newer tracking commit, or to work on changes.

If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the ./vendor/ directory.

make apply-patches

Updating Base Commit

Pin to new base commit

To update to a newer base commit, select the upstream git tag or commit and update llama/vendoring

Applying patches

When updating to a newer base commit, the existing patches may not apply cleanly and require manual merge resolution.

Start by applying the patches. If any of the patches have conflicts, the git am will stop at the first failure.

make apply-patches

If you see an error message about a conflict, go into the ./vendor/ directory, and perform merge resolution using your preferred tool to the patch commit which failed. Save the file(s) and continue the patch series with git am --continue . If any additional patches fail, follow the same pattern until the full patch series is applied. Once finished, run a final create-patches and sync target to ensure everything is updated.

make create-patches sync

Build and test Ollama, and make any necessary changes to the Go code based on the new base commit. Submit your PR to the Ollama repo.

Generating Patches

When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:

make apply-patches

Now edit the upstream native code in the ./vendor/ directory. You do not need to commit every change in order to build, a dirty working tree in the tracking repo is OK while developing. Simply save in your editor, and run the following to refresh the vendored code with your changes, build the backend(s) and build ollama:

make sync
make -j 8
go build .

[!IMPORTANT] Do NOT run apply-patches while you're iterating as that will reset the tracking repo. It will detect a dirty tree and abort, but if your tree is clean and you accidentally ran this target, use git reflog to recover your commit(s).

Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the ./vendor/ directory, then generate the patches for ollama with

make create-patches

[!IMPORTANT] Once you have completed this step, it is safe to run apply-patches since your change is preserved in the patches.

In your ./vendor/ directory, create a branch, and cherry-pick the new commit to that branch, then submit a PR upstream to llama.cpp.

Commit the changes in the ollama repo and submit a PR to Ollama, which will include the vendored code update with your change, along with the patches.

After your PR upstream is merged, follow the Updating Base Commit instructions above, however first remove your patch before running apply-patches since the new base commit contains your change already.

README.md

llama

Building

AVX

AVX2

Linux

CUDA

ROCm

Windows

CUDA

ROCm

Building runners

Vendoring

Updating Base Commit

Applying patches

Generating Patches

`llama`