ブランチ: cuda-search

api

bmizerany/client-registry

bmizerany/client2resume

bmizerany/embedspeedup

bmizerany/fastverify

bmizerany/filepathnobuild

bmizerany/filepathwithcoloninhost

bmizerany/grammar

bmizerany/hrm

bmizerany/modenameenforcealphanum

bmizerany/nameswork

bmizerany/noseek

bmizerany/nosillyggufslurps

bmizerany/replacecolon

bmizerany/types/model/defaultfix

bmizerany/validatenames

bmizerany/x

bruce/iq-quants

brucemacd/allow-ollama

brucemacd/browser-key-register

brucemacd/check-key-register

brucemacd/check-key-register-structured-err

brucemacd/convert-cli

brucemacd/ctx-shift-err

brucemacd/doc-go-engine

brucemacd/done-reason

brucemacd/err-hint

brucemacd/err-no-vocab

brucemacd/forward-test

brucemacd/go_qwen2

brucemacd/install-path-clean

brucemacd/jomorganca/mistral

brucemacd/llama-mem-calc

brucemacd/logprobs

brucemacd/mistral

brucemacd/mistral-small-convert

brucemacd/new_runner_e2e

brucemacd/new_runner_graph_bench

brucemacd/new_runner_qwen2

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/parallel-embed-models

brucemacd/push-name-validation

brucemacd/qwen2_5

brucemacd/rope-config

brucemacd/runner-completion

brucemacd/shim-grammar

brucemacd/tokenize

build_dist

cgo

cp-model

cuda-search

delete-fix

deletemodels

dhiltgen/remove_submodule

distribution

editor

fix-model-names

fix-unknown-model

format-config

go-opts

insecure-registry

jessegross/sample

jessegross/semaphore

jmorgan/sample-fix-sorting-extras

jmorganca/add-missing-symlink-eval

jmorganca/batch-embeddings

jmorganca/degin-1

jmorganca/done-reason

jmorganca/enable-fa

jmorganca/execstack

jmorganca/faster-releases

jmorganca/fix-gguf-error

jmorganca/fix-null-format

jmorganca/fix-proxy

jmorganca/ga

jmorganca/ggml-static

jmorganca/if-none-match

jmorganca/initcmake

jmorganca/limit

jmorganca/llama-bump

jmorganca/llama-cpp-7c26775

jmorganca/llama-cpp-8960fe8

jmorganca/llama-vit

jmorganca/mistral

jmorganca/mistral-wip

jmorganca/mistral3

jmorganca/mllama

jmorganca/mm

jmorganca/native

jmorganca/no-concat

jmorganca/no-error-template

jmorganca/openai-context

jmorganca/openai-fix-first-message

jmorganca/options

jmorganca/qwen2vl

jmorganca/replace-assets

jmorganca/temp-0-images

jmorganca/template-mistral

jmorganca/testing

jmorganca/vendor-081b29bd

jyan/auth

jyan/convert-prog

jyan/format

jyan/local

jyan/local2

jyan/ollama-v

jyan/p2

jyan/paligemma

jyan/palitest

jyan/parse-temp

jyan/progress

jyan/q4_4/8

jyan/quant3

jyan/quant4

jyan/quant5

jyan/reord-g

jyan/v0.146

language_support

license-layers

list-models

main

matt/examplemodelfiles

matt/streamingapi

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/communitylinks

mattw/faq-context

mattw/howtoquant

mattw/noprune

mattw/python-functioncalling

mattw/quantcontext

mattw/selfqueryingretrieval

mattw/whatneedstorun

modelfile-readme

modelpath

modenameenforcealphanum

mxyng/api-models

mxyng/cmd-history

mxyng/create-context

mxyng/environ-2

mxyng/extra-args

mxyng/fix-memory

mxyng/fs-config

mxyng/func-checks

mxyng/gin-slog

mxyng/install

mxyng/layers-from-files

mxyng/mllama

mxyng/modelname-6

mxyng/modelname-7

mxyng/next-bert

mxyng/next-debug

mxyng/next-mlx

mxyng/no-deprecated-gpu-targets

mxyng/server-timestamp

mxyng/split-bin

mxyng/tune-concurrency

mxyng/update-registry-domain

native

nogogen

ollama.com

paligemma-support

parth/cmd-cleanup-SO

parth/constrained-sampling-json

parth/disallow-streaming-tools

parth/fix-default-to-warn-json

parth/fix-referencing-so

parth/log-probs

parth/openai-stream-usage

parth/sample-correctness-fix

parth/sample-fix-sorting

parth/sample-unmarshal-json-for-params

parth/sampling-structured-outputs

parth/set-context-size-openai

parth/templating

parth/tokenize-detokenize

pdevine/bfloat16

pdevine/convert-cohere2

pdevine/fix-template

pdevine/geems-2b

pdevine/gemma2

pdevine/ggla

pdevine/import-docs

pdevine/logging

pdevine/newlines

pdevine/ps-glitches

pdevine/showggmlinfo

progress-flicker

progressbar

pulse

readme-updates

remove-first

rename

revert-5963-revert-5924-mxyng/llama3.1-rope

rmdisplaylong

roy-embed-parallel

royh-embed-parallel

royh-imgembed

royh-ls

royh-name

royh-openai-delete

royh-openai-suffixdocs

royh-params

royh-precision

royh-show-rigid

royh-testdelete

royh/embed-viz

royh/ep-methods

royh/stream-tools

royh/whisper

scratch

shell

skip-list

stream-tools-stop

timeout

update-nous-hermes

upgrade-all

upload-progress

whitespace-detection

Running Ollama on NVIDIA Jetson Devices

With some minor configuration, Ollama runs well on NVIDIA Jetson Devices. The following has been tested on JetPack 5.1.2.

NVIDIA Jetson devices are Linux-based embedded AI computers that are purpose-built for AI applications.

Jetsons have an integrated GPU that is wired directly to the memory controller of the machine. For this reason, the nvidia-smi command is unrecognized, and Ollama proceeds to operate in "CPU only" mode. This can be verified by using a monitoring tool like jtop.

In order to address this, we simply pass the path to the Jetson's pre-installed CUDA libraries into ollama serve (while in a tmux session). We then hardcode the num_gpu parameters into a cloned version of our target model.

Prerequisites:

curl
tmux

Here are the steps:

Install Ollama via standard Linux command (ignore the 404 error): curl https://ollama.ai/install.sh | sh
Stop the Ollama service: sudo systemctl stop ollama
Start Ollama serve in a tmux session called ollama_jetson and reference the CUDA libraries path: tmux has-session -t ollama_jetson 2>/dev/null || tmux new-session -d -s ollama_jetson 'LD_LIBRARY_PATH=/usr/local/cuda/lib64 ollama serve'
Pull the model you want to use (e.g. mistral): ollama pull mistral
Create a new Modelfile specifically for enabling GPU support on the Jetson: touch ModelfileMistralJetson
In the ModelfileMistralJetson file, specify the FROM model and the num_gpu PARAMETER as shown below:
```
FROM mistral
PARAMETER num_gpu 999
```
Create a new model from your Modelfile: ollama create mistral-jetson -f ./ModelfileMistralJetson
Run the new model: ollama run mistral-jetson

If you run a monitoring tool like jtop you should now see that Ollama is using the Jetson's integrated GPU.

And that's it!

nvidia-jetson.md 1.9 KB パーマリンク 履歴 Raw

Running Ollama on NVIDIA Jetson Devices

nvidia-jetson.md 1.9 KB

パーマリンク履歴 Raw