Roy Han
|
2647a0e443
num parallel embed
|
9 miesięcy temu |
Jeffrey Morgan
|
791650ddef
sched: only error when over-allocating system memory (#5626)
|
9 miesięcy temu |
Jeffrey Morgan
|
e4ff73297d
server: fix model reloads when setting `OLLAMA_NUM_PARALLEL` (#5560)
|
9 miesięcy temu |
Jeffrey Morgan
|
0ee87615c7
sched: don't error if paging to disk on Windows and macOS (#5523)
|
10 miesięcy temu |
Daniel Hiltgen
|
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
|
10 miesięcy temu |
Daniel Hiltgen
|
955f2a4e03
Only set default keep_alive on initial model load
|
10 miesięcy temu |
Daniel Hiltgen
|
3c75113e37
Prevent loading models larger than total memory
|
10 miesięcy temu |
Daniel Hiltgen
|
cff3f44f4a
Fix case for NumCtx
|
10 miesięcy temu |
Daniel Hiltgen
|
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
|
10 miesięcy temu |
Blake Mizerany
|
cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
|
10 miesięcy temu |
Daniel Hiltgen
|
9929751cc8
Disable concurrency for AMD + Windows
|
10 miesięcy temu |
Daniel Hiltgen
|
17b7186cd7
Enable concurrency by default
|
1 rok temu |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
11 miesięcy temu |
Daniel Hiltgen
|
ff4f0cbd1d
Prevent multiple concurrent loads on the same gpus
|
11 miesięcy temu |
Daniel Hiltgen
|
fc37c192ae
Refine CPU load behavior with system memory visibility
|
11 miesięcy temu |
Daniel Hiltgen
|
434dfe30c5
Reintroduce nvidia nvml library for windows
|
11 miesięcy temu |
Daniel Hiltgen
|
48702dd149
Harden unload for empty runners
|
11 miesięcy temu |
Daniel Hiltgen
|
5e8ff556cb
Support forced spreading for multi GPU
|
11 miesięcy temu |
Michael Yang
|
e40145a39d
lint
|
11 miesięcy temu |
Michael Yang
|
c895a7d13f
some gocritic
|
11 miesięcy temu |
Michael Yang
|
04f3c12bb7
replace x/exp/slices with slices
|
11 miesięcy temu |
Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
11 miesięcy temu |
Sang Park
|
4434d7f447
Correct typo in error message (#4535)
|
11 miesięcy temu |
Daniel Hiltgen
|
ec231a7923
Remove VRAM convergence check for windows
|
11 miesięcy temu |
Patrick Devine
|
6845988807
Ollama `ps` command for showing currently loaded models (#4327)
|
11 miesięcy temu |
Daniel Hiltgen
|
4142c3ef7c
Always use the sorted list of GPUs
|
11 miesięcy temu |
Jeffrey Morgan
|
bb6fd02298
Don't clamp ctx size in `PredictServerFit` (#4317)
|
11 miesięcy temu |
Daniel Hiltgen
|
354ad9254e
Wait for GPU free memory reporting to converge
|
11 miesięcy temu |
Jeffrey Morgan
|
c9f98622b1
Skip scheduling cancelled requests, always reload unloaded runners (#4189)
|
1 rok temu |
Jeffrey Morgan
|
dfa2f32ca0
unload in critical section (#4187)
|
1 rok temu |