Patrick Devine
|
5f74d1fd47
gemma2 impl
|
2 mesiacov pred |
Jesse Gross
|
4100ed7bdd
ml: Add support for quantized KV cache
|
2 mesiacov pred |
Jesse Gross
|
25f9b152f9
ggml-backend: Ensure allocation meet backend requirements
|
1 mesiac pred |
Jesse Gross
|
98272fbd58
additional review comments
|
1 mesiac pred |
Michael Yang
|
b27e8f3f10
ml/backend/ggml: use backend buffer type
|
1 mesiac pred |
Michael Yang
|
45df786f09
comments
|
1 mesiac pred |
Michael Yang
|
daaf42e4a4
ml/backend/ggml: clean up
|
2 mesiacov pred |
Michael Yang
|
2dc60d4620
ml/backend/ggml: offload vision to cpu
|
2 mesiacov pred |
Michael Yang
|
b5312f30e8
ml/backend/ggml: handle tensor split
|
2 mesiacov pred |
Michael Yang
|
26c2e0bd35
ml/backend/ggml: handle user specified cpu offloading
|
2 mesiacov pred |
Michael Yang
|
bf920883d5
ml/backend/ggml: set cpu n_threads
|
2 mesiacov pred |
Michael Yang
|
7bae7fa5ce
ml/backend/ggml: create tensor on specific backend
|
2 mesiacov pred |
Michael Yang
|
764e199d67
kvcache: create cache ctx per layer
|
2 mesiacov pred |
Michael Yang
|
bfce55db3d
model: load non-repeated tensors into multiple backends
|
2 mesiacov pred |
Michael Yang
|
bab6f34dc0
ml/backend/ggml: update model loading for hybrid/multi backends
|
2 mesiacov pred |
Michael Yang
|
05a01fdecb
ml/backend/ggml: consolidate system info logging
|
2 mesiacov pred |
Jesse Gross
|
21aa666a1e
ml: Enable support for flash attention
|
2 mesiacov pred |
Jesse Gross
|
ee141cc821
ml: Empty tensor constructor for tensors
|
2 mesiacov pred |
Jesse Gross
|
55e5776c44
ggml-backend: Store parent backend as part of tensor
|
2 mesiacov pred |
Jesse Gross
|
854a9195f3
attention: Remove unnecessary contiguous operations
|
2 mesiacov pred |
Michael Yang
|
3e8b8a1933
ml: update Context.Forward interface
|
2 mesiacov pred |
Jesse Gross
|
f53f4198c3
ml: Abstract attention out of model definitions
|
2 mesiacov pred |
Michael Yang
|
2192a28eed
ml/backend/ggml: fix rms norm
|
2 mesiacov pred |
Jesse Gross
|
e5bcc51ae1
ggml-backend: Don't recreate the scheduler for each context
|
2 mesiacov pred |
Jesse Gross
|
bd6a7d5e64
ollamarunner: Pass runner performance parameters to backends
|
2 mesiacov pred |
Daniel Hiltgen
|
df2680b4b9
Wire up system info log for new engine (#9123)
|
2 mesiacov pred |
Jesse Gross
|
ed443a0393
Runner for Ollama engine
|
4 mesiacov pred |
Jesse Gross
|
d223f3b697
ggml-backend: Close on nil should be a no-op
|
2 mesiacov pred |
Jesse Gross
|
60830695c2
ggml-backend: Ensure data is available after async computation
|
2 mesiacov pred |
Jesse Gross
|
01d9a46854
ggml-backend: Let GGML allocate context memory
|
3 mesiacov pred |