Jesse Gross
|
9679f40146
ml: Allow models to constrain inputs to a single batch
|
1 月之前 |
Bruce MacDonald
|
3892c3a703
llm: remove internal subprocess req and resp types (#9324)
|
1 月之前 |
Michael Yang
|
ec46f3286c
engine: error on embeddings; not currently implemented
|
1 月之前 |
Jeffrey Morgan
|
e093db92c4
sample: temporarily use grammars for constrained generation in new engine (#9586)
|
1 月之前 |
Jesse Gross
|
a1cda80bcb
model: Update encoder cache to use multimodal input processing handler
|
1 月之前 |
Jesse Gross
|
4614fafae0
ollamarunner: Don't panic for unimplemented features at runtime.
|
1 月之前 |
Jesse Gross
|
0daaaef8c9
ollamarunner: Quiet debug logging and panic on unimplemented features
|
1 月之前 |
Parth Sareen
|
0682dae027
sample: improve ollama engine sampler performance (#9374)
|
1 月之前 |
Jesse Gross
|
a7e63b82be
ollamarunner: Improve multimodal input handling
|
1 月之前 |
Jesse Gross
|
b70fc4d51e
model: Don't unconditionally add special tokens
|
1 月之前 |
Michael Yang
|
05a01fdecb
ml/backend/ggml: consolidate system info logging
|
2 月之前 |
Jesse Gross
|
21aa666a1e
ml: Enable support for flash attention
|
2 月之前 |
Michael Yang
|
31e472baa4
runner: defer context cancel
|
2 月之前 |
Bruce MacDonald
|
0c1041ad85
runner: default to greedy sampler for performance (#9407)
|
2 月之前 |
Michael Yang
|
d6af13efed
runner: simplify tensor split parsing
|
2 月之前 |
Parth Sareen
|
0b7e1676eb
sample: add sampling package for new engine (#8410)
|
2 月之前 |
Jesse Gross
|
bd6a7d5e64
ollamarunner: Pass runner performance parameters to backends
|
2 月之前 |
Daniel Hiltgen
|
df2680b4b9
Wire up system info log for new engine (#9123)
|
2 月之前 |
Jesse Gross
|
ed443a0393
Runner for Ollama engine
|
4 月之前 |