Jesse Gross
|
f66216e399
ggml: Support heterogeneous KV cache layer sizes in memory estimation
|
1 mês atrás |
Jesse Gross
|
f4f0992b6e
llm: Fix debug logging for memory estimates
|
1 mês atrás |
Michael Yang
|
033cec232a
count gemma3 vision tensors
|
1 mês atrás |
Daniel Hiltgen
|
1fdb351c37
New engine: vision models and auto-fallback (#9113)
|
1 mês atrás |
Michael Yang
|
58245413f4
next ollama runner (#7913)
|
2 meses atrás |
frob
|
63269668c0
Prevent underflow when FreeMemory < overhead (#8014)
|
4 meses atrás |
Sam
|
539be43640
llm: normalise kvct parameter handling (#7926)
|
4 meses atrás |
Sam
|
1bdab9fdb1
llm: introduce k/v context quantization (vRAM improvements) (#6279)
|
4 meses atrás |
Michael Yang
|
d07cf41a97
refactor kv estimation
|
6 meses atrás |
Patrick Devine
|
c7cb0f0602
image processing for llama3.2 (#6963)
|
6 meses atrás |
Daniel Hiltgen
|
05cd82ef94
Rename gpu package discover (#7143)
|
6 meses atrás |
Daniel Hiltgen
|
56318fb365
Improve logging on GPU too small (#6666)
|
7 meses atrás |
Daniel Hiltgen
|
b05c9e83d9
Introduce GPU Overhead env var (#5922)
|
7 meses atrás |
Michael Yang
|
8e0641a9bf
handle asymmetric embedding KVs
|
10 meses atrás |
Daniel Hiltgen
|
359b15a597
Handle models with divergent layer sizes
|
10 meses atrás |
Daniel Hiltgen
|
7784ca33ce
Tighten up memory prediction logging
|
10 meses atrás |
Daniel Hiltgen
|
17df6520c8
Remove mmap related output calc logic
|
10 meses atrás |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
11 meses atrás |
Daniel Hiltgen
|
6fd04ca922
Improve multi-gpu handling at the limit
|
11 meses atrás |
Michael Yang
|
6297f85606
gofmt, goimports
|
11 meses atrás |
Michael Yang
|
e40145a39d
lint
|
11 meses atrás |
Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
11 meses atrás |
Michael Yang
|
1d359e737e
typo
|
11 meses atrás |
Michael Yang
|
50b9056e09
count memory up to NumGPU
|
11 meses atrás |
Jeffrey Morgan
|
bb6fd02298
Don't clamp ctx size in `PredictServerFit` (#4317)
|
11 meses atrás |
Daniel Hiltgen
|
bee2f4a3b0
Record GPU usage information
|
1 ano atrás |
Michael Yang
|
4736391bfb
llm: add minimum based on layer size
|
1 ano atrás |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
1 ano atrás |
Jeffrey Morgan
|
f0c454ab57
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068)
|
1 ano atrás |
Michael Yang
|
f81f308118
fix gemma, command-r layer weights
|
1 ano atrás |