Daniel Hiltgen
|
17df6520c8
Remove mmap related output calc logic
|
há 10 meses atrás |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
há 11 meses atrás |
Daniel Hiltgen
|
6fd04ca922
Improve multi-gpu handling at the limit
|
há 11 meses atrás |
Michael Yang
|
6297f85606
gofmt, goimports
|
há 11 meses atrás |
Michael Yang
|
e40145a39d
lint
|
há 11 meses atrás |
Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
há 11 meses atrás |
Michael Yang
|
1d359e737e
typo
|
há 11 meses atrás |
Michael Yang
|
50b9056e09
count memory up to NumGPU
|
há 11 meses atrás |
Jeffrey Morgan
|
bb6fd02298
Don't clamp ctx size in `PredictServerFit` (#4317)
|
há 11 meses atrás |
Daniel Hiltgen
|
bee2f4a3b0
Record GPU usage information
|
há 1 ano atrás |
Michael Yang
|
4736391bfb
llm: add minimum based on layer size
|
há 1 ano atrás |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
há 1 ano atrás |
Jeffrey Morgan
|
f0c454ab57
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068)
|
há 1 ano atrás |
Michael Yang
|
f81f308118
fix gemma, command-r layer weights
|
há 1 ano atrás |
Michael Yang
|
7bb7cb8a60
only count output tensors
|
há 1 ano atrás |
Daniel Hiltgen
|
5445aaa94e
Add back memory escape valve
|
há 1 ano atrás |
Daniel Hiltgen
|
34b9db5afc
Request and model concurrency
|
há 1 ano atrás |