Patrick Devine
|
b73a512f24
fix the cpu estimatedTotal memory + get the expiry time for loading models
|
vor 11 Monaten |
Daniel Hiltgen
|
853ae490e1
Sanitize the env var debug log
|
vor 11 Monaten |
Patrick Devine
|
6845988807
Ollama `ps` command for showing currently loaded models (#4327)
|
vor 11 Monaten |
jmorganca
|
92ca2cca95
Revert "only forward some env vars"
|
vor 11 Monaten |
Daniel Hiltgen
|
c4014e73a2
Fall back to CPU runner with zero layers
|
vor 11 Monaten |
Jeffrey Morgan
|
bb6fd02298
Don't clamp ctx size in `PredictServerFit` (#4317)
|
vor 11 Monaten |
Michael Yang
|
cf442cd57e
fix typo
|
vor 11 Monaten |
Michael Yang
|
ce3b212d12
only forward some env vars
|
vor 11 Monaten |
Michael Yang
|
58876091f7
log clean up
|
vor 11 Monaten |
Daniel Hiltgen
|
d0425f26cf
Merge pull request #4294 from dhiltgen/harden_subprocess_reaping
|
vor 11 Monaten |
Bruce MacDonald
|
cfa84b8470
add done_reason to the api (#4235)
|
vor 11 Monaten |
Daniel Hiltgen
|
84ac7ce139
Refine subprocess reaping
|
vor 1 Jahr |
Daniel Hiltgen
|
920a4b0794
Merge remote-tracking branch 'upstream/main' into pr3702
|
vor 1 Jahr |
Daniel Hiltgen
|
ee49844d09
Merge pull request #4153 from dhiltgen/gpu_verbose_response
|
vor 1 Jahr |
Daniel Hiltgen
|
bee2f4a3b0
Record GPU usage information
|
vor 1 Jahr |
Daniel Hiltgen
|
72700279e2
Detect noexec and report a better error
|
vor 1 Jahr |
Daniel Hiltgen
|
380378cc80
Use our libraries first
|
vor 1 Jahr |
Jeffrey Morgan
|
ed740a2504
Fix `no slots available` error with concurrent requests (#4160)
|
vor 1 Jahr |
Jeffrey Morgan
|
1b0e6c9c0e
Fix llava models not working after first request (#4164)
|
vor 1 Jahr |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
vor 1 Jahr |
Mark Ward
|
321d57e1a0
Removing go routine calling .wait from load.
|
vor 1 Jahr |
Mark Ward
|
ba26c7aa00
it will always return an error due to Kill() discarding Wait() errors
|
vor 1 Jahr |
Mark Ward
|
63c763685f
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
|
vor 1 Jahr |
Mark Ward
|
948114e3e3
fix sched to wait for the runner to terminate to ensure following vram check will be more accurate
|
vor 1 Jahr |
Jeffrey Morgan
|
7aa08a77ca
llm: dont cap context window limit to training context window (#3988)
|
vor 1 Jahr |
Jeffrey Morgan
|
bb31def011
return code `499` when user cancels request while a model is loading (#3955)
|
vor 1 Jahr |
Jeffrey Morgan
|
993cf8bf55
llm: limit generation to 10x context size to avoid run on generations (#3918)
|
vor 1 Jahr |
Daniel Hiltgen
|
6e76348df7
Merge pull request #3834 from dhiltgen/not_found_in_path
|
vor 1 Jahr |
Daniel Hiltgen
|
58888a74bc
Detect and recover if runner removed
|
vor 1 Jahr |
Daniel Hiltgen
|
34b9db5afc
Request and model concurrency
|
vor 1 Jahr |