Michael Yang
|
74bd09652d
ml/backend/ggml: load tensors in 32KiB chunks
|
1 mese fa |
Jesse Gross
|
3ed7ad3ab3
kvcache: Pass granular cache size into implementations
|
1 mese fa |
Jesse Gross
|
0ff28758b3
ollamarunner: Provide mechanism for backends to report loading progress
|
1 mese fa |
Jesse Gross
|
0fbfcf3c9c
model: Pass input tensor instead of raw data to models
|
1 mese fa |
Jesse Gross
|
0c220935bd
input: Rename Options to Batch
|
1 mese fa |
Jesse Gross
|
bf24498b1e
ollamarunner: Check for minBatch of context space when shifting
|
1 mese fa |
Bruce MacDonald
|
95e271d98f
runner: remove cache prompt flag from ollama runner (#9826)
|
1 mese fa |
Jesse Gross
|
282bfaaa95
ollamarunner: Use a separate context per multimodal input
|
1 mese fa |
Jesse Gross
|
9679f40146
ml: Allow models to constrain inputs to a single batch
|
1 mese fa |
Bruce MacDonald
|
3892c3a703
llm: remove internal subprocess req and resp types (#9324)
|
1 mese fa |
Michael Yang
|
ec46f3286c
engine: error on embeddings; not currently implemented
|
1 mese fa |
Jeffrey Morgan
|
e093db92c4
sample: temporarily use grammars for constrained generation in new engine (#9586)
|
1 mese fa |
Jesse Gross
|
a1cda80bcb
model: Update encoder cache to use multimodal input processing handler
|
1 mese fa |
Jesse Gross
|
4614fafae0
ollamarunner: Don't panic for unimplemented features at runtime.
|
1 mese fa |
Jesse Gross
|
0daaaef8c9
ollamarunner: Quiet debug logging and panic on unimplemented features
|
1 mese fa |
Parth Sareen
|
0682dae027
sample: improve ollama engine sampler performance (#9374)
|
1 mese fa |
Jesse Gross
|
a7e63b82be
ollamarunner: Improve multimodal input handling
|
1 mese fa |
Jesse Gross
|
b70fc4d51e
model: Don't unconditionally add special tokens
|
1 mese fa |
Michael Yang
|
05a01fdecb
ml/backend/ggml: consolidate system info logging
|
2 mesi fa |
Jesse Gross
|
21aa666a1e
ml: Enable support for flash attention
|
2 mesi fa |
Michael Yang
|
31e472baa4
runner: defer context cancel
|
2 mesi fa |
Bruce MacDonald
|
0c1041ad85
runner: default to greedy sampler for performance (#9407)
|
2 mesi fa |
Michael Yang
|
d6af13efed
runner: simplify tensor split parsing
|
2 mesi fa |
Parth Sareen
|
0b7e1676eb
sample: add sampling package for new engine (#8410)
|
2 mesi fa |
Jesse Gross
|
bd6a7d5e64
ollamarunner: Pass runner performance parameters to backends
|
2 mesi fa |
Daniel Hiltgen
|
df2680b4b9
Wire up system info log for new engine (#9123)
|
2 mesi fa |
Jesse Gross
|
ed443a0393
Runner for Ollama engine
|
4 mesi fa |