Michael Yang
|
74bd09652d
ml/backend/ggml: load tensors in 32KiB chunks
|
1 month ago |
Bruce MacDonald
|
df94175a0f
ggml: return error on failure to read tensor data (#9872)
|
1 month ago |
Michael Yang
|
021dcf089d
Merge pull request #9824 from ollama/mxyng/sched
|
1 month ago |
Jeffrey Morgan
|
364629b8d6
ml/backend/ggml: allocate memory with malloc when loading model (#9822)
|
1 month ago |
Michael Yang
|
4561fff36e
conditionally enable parallel pipelines
|
1 month ago |
Michael Yang
|
63a394068c
use 2d pooling
|
1 month ago |
Michael Yang
|
c5cbe4fc2a
fallback to cpu
|
1 month ago |
Michael Yang
|
9e4642e9b3
ollama debug tensor
|
1 month ago |
Michael Yang
|
6b0486c216
duplicate token_embd to output
|
1 month ago |
Michael Yang
|
8934324b72
use fast attention
|
1 month ago |
Michael Yang
|
0df1800436
set non-causal attention
|
1 month ago |
Michael Yang
|
4b037a97dc
add gemma vision encoder
|
1 month ago |
Patrick Devine
|
5f74d1fd47
gemma2 impl
|
2 months ago |
Jesse Gross
|
4100ed7bdd
ml: Add support for quantized KV cache
|
2 months ago |
Jesse Gross
|
25f9b152f9
ggml-backend: Ensure allocation meet backend requirements
|
1 month ago |
Jesse Gross
|
98272fbd58
additional review comments
|
1 month ago |
Michael Yang
|
b27e8f3f10
ml/backend/ggml: use backend buffer type
|
1 month ago |
Michael Yang
|
45df786f09
comments
|
1 month ago |
Michael Yang
|
daaf42e4a4
ml/backend/ggml: clean up
|
2 months ago |
Michael Yang
|
2dc60d4620
ml/backend/ggml: offload vision to cpu
|
2 months ago |
Michael Yang
|
b5312f30e8
ml/backend/ggml: handle tensor split
|
2 months ago |
Michael Yang
|
26c2e0bd35
ml/backend/ggml: handle user specified cpu offloading
|
2 months ago |
Michael Yang
|
bf920883d5
ml/backend/ggml: set cpu n_threads
|
2 months ago |
Michael Yang
|
7bae7fa5ce
ml/backend/ggml: create tensor on specific backend
|
2 months ago |
Michael Yang
|
764e199d67
kvcache: create cache ctx per layer
|
2 months ago |
Michael Yang
|
bfce55db3d
model: load non-repeated tensors into multiple backends
|
2 months ago |
Michael Yang
|
bab6f34dc0
ml/backend/ggml: update model loading for hybrid/multi backends
|
2 months ago |
Michael Yang
|
05a01fdecb
ml/backend/ggml: consolidate system info logging
|
2 months ago |
Jesse Gross
|
21aa666a1e
ml: Enable support for flash attention
|
2 months ago |
Jesse Gross
|
ee141cc821
ml: Empty tensor constructor for tensors
|
2 months ago |