Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
11 月之前 |
Michael Yang
|
1d359e737e
typo
|
11 月之前 |
Michael Yang
|
50b9056e09
count memory up to NumGPU
|
11 月之前 |
Jeffrey Morgan
|
bb6fd02298
Don't clamp ctx size in `PredictServerFit` (#4317)
|
11 月之前 |
Daniel Hiltgen
|
bee2f4a3b0
Record GPU usage information
|
1 年之前 |
Michael Yang
|
4736391bfb
llm: add minimum based on layer size
|
1 年之前 |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
1 年之前 |
Jeffrey Morgan
|
f0c454ab57
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068)
|
1 年之前 |
Michael Yang
|
f81f308118
fix gemma, command-r layer weights
|
1 年之前 |
Michael Yang
|
7bb7cb8a60
only count output tensors
|
1 年之前 |
Daniel Hiltgen
|
5445aaa94e
Add back memory escape valve
|
1 年之前 |
Daniel Hiltgen
|
34b9db5afc
Request and model concurrency
|
1 年之前 |