Michael Yang
|
4b34930a31
Merge pull request #9897 from ollama/mxyng/chunk-load
|
hai 1 mes |
Michael Yang
|
74bd09652d
ml/backend/ggml: load tensors in 32KiB chunks
|
hai 1 mes |
Bruce MacDonald
|
fb6252d786
benchmark: performance of running ollama server (#8643)
|
hai 1 mes |
Blake Mizerany
|
c794fef2f2
server/internal/client/ollama: persist through chunk download errors (#9923)
|
hai 1 mes |
Parth Sareen
|
00ebda8cc4
Revert "parser: remove role validation from Modelfile parser" (#9917)
|
hai 1 mes |
Parth Sareen
|
d14ce75b95
docs: update final response for /api/chat stream (#9919)
|
hai 1 mes |
Jesse Gross
|
2d6eac9084
kvcache: Optimize sliding window attention
|
hai 1 mes |
Jesse Gross
|
3ed7ad3ab3
kvcache: Pass granular cache size into implementations
|
hai 1 mes |
Patrick Devine
|
6d1103048e
fix: show correct bool value for kv in verbose show information (#9928)
|
hai 1 mes |
Jesse Gross
|
0ff28758b3
ollamarunner: Provide mechanism for backends to report loading progress
|
hai 1 mes |
Jesse Gross
|
d3e9ca3eda
kvcache: Account for source tensors in defrag operation count
|
hai 1 mes |
Jesse Gross
|
0fbfcf3c9c
model: Pass input tensor instead of raw data to models
|
hai 1 mes |
Jesse Gross
|
0c220935bd
input: Rename Options to Batch
|
hai 1 mes |
rylativity
|
ffbfe833da
parser: remove role validation from Modelfile parser (#9874)
|
hai 1 mes |
Parth Sareen
|
42a14f7f63
sample: add error handling for empty logits (#9740)
|
hai 1 mes |
Patrick Devine
|
f8c3dbe5b5
templates: add autotemplate for gemma3 (#9880)
|
hai 1 mes |
Jesse Gross
|
b078dd157c
gemma2: Remove second call to Rows
|
hai 1 mes |
Blake Mizerany
|
2ddacd7516
server/internal/client/ollama: confirm all chunksums were received (#9893)
|
hai 1 mes |
Jeffrey Morgan
|
da0e345200
ml: use input context for extracting outputs (#9875)
|
hai 1 mes |
Bruce MacDonald
|
df94175a0f
ggml: return error on failure to read tensor data (#9872)
|
hai 1 mes |
Bruce MacDonald
|
61a8825216
convert: return name of unsupported architecture (#9862)
|
hai 1 mes |
Michael Yang
|
021dcf089d
Merge pull request #9824 from ollama/mxyng/sched
|
hai 1 mes |
Jesse Gross
|
bf24498b1e
ollamarunner: Check for minBatch of context space when shifting
|
hai 1 mes |
Bruce MacDonald
|
95e271d98f
runner: remove cache prompt flag from ollama runner (#9826)
|
hai 1 mes |
Jeffrey Morgan
|
364629b8d6
ml/backend/ggml: allocate memory with malloc when loading model (#9822)
|
hai 1 mes |
Parth Sareen
|
108fe02165
sample: make mutations in transforms explicit (#9743)
|
hai 1 mes |
Michael Yang
|
4561fff36e
conditionally enable parallel pipelines
|
hai 1 mes |
Daniel Hiltgen
|
50b5962042
Add support for ROCm gfx1151 (#9773)
|
hai 1 mes |
Louis Beaumont
|
e27e4a3c1b
readme: add screenpipe to community integrations (#9786)
|
hai 1 mes |
zeo
|
088514bbd4
readme: add Ellama to list of community integrations (#9800)
|
hai 1 mes |