Blake Mizerany
|
cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
|
10 ヶ月 前 |
Daniel Hiltgen
|
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
|
10 ヶ月 前 |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
11 ヶ月 前 |
Daniel Hiltgen
|
fc37c192ae
Refine CPU load behavior with system memory visibility
|
11 ヶ月 前 |
Daniel Hiltgen
|
6fd04ca922
Improve multi-gpu handling at the limit
|
11 ヶ月 前 |
Jeffrey Morgan
|
dd7c9ebeaf
server: longer timeout in `TestRequests` (#5046)
|
10 ヶ月 前 |
Michael Yang
|
e40145a39d
lint
|
11 ヶ月 前 |
Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
11 ヶ月 前 |
Jeffrey Morgan
|
38255d2af1
Use flash attention flag for now (#4580)
|
11 ヶ月 前 |
Patrick Devine
|
6845988807
Ollama `ps` command for showing currently loaded models (#4327)
|
11 ヶ月 前 |
Daniel Hiltgen
|
0a954e5066
Fix stale test logic
|
1 年間 前 |
Jeffrey Morgan
|
dfa2f32ca0
unload in critical section (#4187)
|
1 年間 前 |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
1 年間 前 |
Daniel Hiltgen
|
9a32c514cb
Soften timeouts on sched unit tests
|
1 年間 前 |
Daniel Hiltgen
|
d6e3b64582
Fix concurrency for CPU mode
|
1 年間 前 |
Jeffrey Morgan
|
00b0699c75
Reload model if `num_gpu` changes (#3920)
|
1 年間 前 |
Bryce Reitano
|
36a6daccab
Restructure loading conditional chain
|
1 年間 前 |
Bryce Reitano
|
ceb0e26e5e
Provide variable ggml for TestLoad
|
1 年間 前 |
Bryce Reitano
|
284e02bed0
Move ggml loading to when we attempt fitting
|
1 年間 前 |
Daniel Hiltgen
|
d8851cb7a0
Harden sched TestLoad
|
1 年間 前 |
Daniel Hiltgen
|
34b9db5afc
Request and model concurrency
|
1 年間 前 |