Daniel Hiltgen
|
90ca84172c
Fix embeddings memory corruption (#6467)
|
8 ヶ月 前 |
Jeffrey Morgan
|
15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
|
8 ヶ月 前 |
Jeffrey Morgan
|
e04c7012c2
update llama.cpp submodule to `1e6f6554` (#6208)
|
9 ヶ月 前 |
royjhan
|
86b907f82a
sort batch results (#6189)
|
9 ヶ月 前 |
royjhan
|
1b44d873e7
Add Metrics to `api\embed` response (#5709)
|
9 ヶ月 前 |
Jeffrey Morgan
|
68ee42f995
update llama.cpp submodule to `6eeaeba1` (#6039)
|
9 ヶ月 前 |
Daniel Hiltgen
|
e12fff8810
Enable windows error dialog for subprocess startup
|
9 ヶ月 前 |
royjhan
|
b9f5e16c80
Introduce `/api/embed` endpoint supporting batch embedding (#5127)
|
9 ヶ月 前 |
Jeffrey Morgan
|
d8def1ff94
llm: allow gemma 2 to context shift (#5534)
|
10 ヶ月 前 |
Jeffrey Morgan
|
0e09c380fc
llm: print caching notices in debug only (#5533)
|
10 ヶ月 前 |
Jeffrey Morgan
|
d89454de80
Use slot with cached prompt instead of least recently used (#5492)
|
10 ヶ月 前 |
royjhan
|
3b5a4a77f3
Return Correct Prompt Eval Count Regardless of Cache Prompt (#5371)
|
10 ヶ月 前 |
Jeffrey Morgan
|
717f7229eb
Do not shift context for sliding window models (#5368)
|
10 ヶ月 前 |
Michael Yang
|
9d91e5e587
remove confusing log message
|
10 ヶ月 前 |
Daniel Hiltgen
|
fb9cdfa723
Fix server.cpp for the new cuda build macros
|
11 ヶ月 前 |
Jeffrey Morgan
|
ead259d877
llm: fix seed value not being applied to requests (#4986)
|
10 ヶ月 前 |
Jeffrey Morgan
|
34f142797a
llm: always add bos token to prompt (#4941)
|
10 ヶ月 前 |
Michael Yang
|
829ff87bd1
revert tokenize ffi (#4761)
|
11 ヶ月 前 |
Michael Yang
|
de781b37c8
rm unused infill
|
11 ヶ月 前 |
Michael Yang
|
3e21799377
rm unused system prompt
|
11 ヶ月 前 |
Michael Yang
|
26a00a0410
use ffi for tokenizing/detokenizing
|
11 ヶ月 前 |
Michael Yang
|
714adb8bd1
bump (#4597)
|
11 ヶ月 前 |
Daniel Hiltgen
|
b37b496a12
Wire up load progress
|
11 ヶ月 前 |
Sam
|
e15307fdf4
feat: add support for flash_attn (#4120)
|
11 ヶ月 前 |
Michael Yang
|
58876091f7
log clean up
|
11 ヶ月 前 |
Daniel Hiltgen
|
920a4b0794
Merge remote-tracking branch 'upstream/main' into pr3702
|
1 年間 前 |
Michael Yang
|
44869c59d6
omit prompt and generate settings from final response
|
1 年間 前 |
jmorganca
|
fcf4d60eee
llm: add back check for empty token cache
|
1 年間 前 |
Jeffrey Morgan
|
18d9a7e1f1
update llama.cpp submodule to `f364eb6` (#4060)
|
1 年間 前 |
Daniel Hiltgen
|
23d23409a0
Update llama.cpp (#4036)
|
1 年間 前 |