Parth Sareen
|
0682dae027
sample: improve ollama engine sampler performance (#9374)
|
1 mēnesi atpakaļ |
Jesse Gross
|
a7e63b82be
ollamarunner: Improve multimodal input handling
|
1 mēnesi atpakaļ |
Jesse Gross
|
b70fc4d51e
model: Don't unconditionally add special tokens
|
1 mēnesi atpakaļ |
Michael Yang
|
05a01fdecb
ml/backend/ggml: consolidate system info logging
|
2 mēneši atpakaļ |
Jesse Gross
|
21aa666a1e
ml: Enable support for flash attention
|
2 mēneši atpakaļ |
Michael Yang
|
31e472baa4
runner: defer context cancel
|
2 mēneši atpakaļ |
Bruce MacDonald
|
0c1041ad85
runner: default to greedy sampler for performance (#9407)
|
2 mēneši atpakaļ |
Michael Yang
|
d6af13efed
runner: simplify tensor split parsing
|
2 mēneši atpakaļ |
Parth Sareen
|
0b7e1676eb
sample: add sampling package for new engine (#8410)
|
2 mēneši atpakaļ |
Jesse Gross
|
bd6a7d5e64
ollamarunner: Pass runner performance parameters to backends
|
2 mēneši atpakaļ |
Daniel Hiltgen
|
df2680b4b9
Wire up system info log for new engine (#9123)
|
2 mēneši atpakaļ |
Jesse Gross
|
ed443a0393
Runner for Ollama engine
|
4 mēneši atpakaļ |