Daniel Hiltgen
|
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
|
il y a 10 mois |
Daniel Hiltgen
|
955f2a4e03
Only set default keep_alive on initial model load
|
il y a 10 mois |
Daniel Hiltgen
|
3c75113e37
Prevent loading models larger than total memory
|
il y a 10 mois |
Daniel Hiltgen
|
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
|
il y a 10 mois |
Blake Mizerany
|
cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
|
il y a 10 mois |
Daniel Hiltgen
|
17b7186cd7
Enable concurrency by default
|
il y a 1 an |
Daniel Hiltgen
|
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
|
il y a 10 mois |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
il y a 11 mois |
Daniel Hiltgen
|
fc37c192ae
Refine CPU load behavior with system memory visibility
|
il y a 11 mois |
Daniel Hiltgen
|
6fd04ca922
Improve multi-gpu handling at the limit
|
il y a 11 mois |
Jeffrey Morgan
|
dd7c9ebeaf
server: longer timeout in `TestRequests` (#5046)
|
il y a 10 mois |
Michael Yang
|
e40145a39d
lint
|
il y a 11 mois |
Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
il y a 11 mois |
Jeffrey Morgan
|
38255d2af1
Use flash attention flag for now (#4580)
|
il y a 11 mois |
Patrick Devine
|
6845988807
Ollama `ps` command for showing currently loaded models (#4327)
|
il y a 11 mois |
Daniel Hiltgen
|
0a954e5066
Fix stale test logic
|
il y a 1 an |
Jeffrey Morgan
|
dfa2f32ca0
unload in critical section (#4187)
|
il y a 1 an |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
il y a 1 an |
Daniel Hiltgen
|
9a32c514cb
Soften timeouts on sched unit tests
|
il y a 1 an |
Daniel Hiltgen
|
d6e3b64582
Fix concurrency for CPU mode
|
il y a 1 an |
Jeffrey Morgan
|
00b0699c75
Reload model if `num_gpu` changes (#3920)
|
il y a 1 an |
Bryce Reitano
|
36a6daccab
Restructure loading conditional chain
|
il y a 1 an |
Bryce Reitano
|
ceb0e26e5e
Provide variable ggml for TestLoad
|
il y a 1 an |
Bryce Reitano
|
284e02bed0
Move ggml loading to when we attempt fitting
|
il y a 1 an |
Daniel Hiltgen
|
d8851cb7a0
Harden sched TestLoad
|
il y a 1 an |
Daniel Hiltgen
|
34b9db5afc
Request and model concurrency
|
il y a 1 an |