Daniel Hiltgen
|
bee2f4a3b0
Record GPU usage information
|
1 年之前 |
Michael Yang
|
4736391bfb
llm: add minimum based on layer size
|
1 年之前 |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
1 年之前 |
Jeffrey Morgan
|
f0c454ab57
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068)
|
1 年之前 |
Michael Yang
|
f81f308118
fix gemma, command-r layer weights
|
1 年之前 |
Michael Yang
|
7bb7cb8a60
only count output tensors
|
1 年之前 |
Daniel Hiltgen
|
5445aaa94e
Add back memory escape valve
|
1 年之前 |
Daniel Hiltgen
|
34b9db5afc
Request and model concurrency
|
1 年之前 |