.. |
ext_server
|
730dcfcc7a
Refine debug logging for llm
|
1 rok temu |
generate
|
a64570dcae
Fix clearing kv cache between requests with the same prompt (#2186)
|
1 rok temu |
llama.cpp @ cd4fddb29f
|
3ebd6a83fc
update submodule to `cd4fddb29f81d6a1f6d51a0c016bc6b486d68def`
|
1 rok temu |
patches
|
a64570dcae
Fix clearing kv cache between requests with the same prompt (#2186)
|
1 rok temu |
dyn_ext_server.c
|
6a042438af
Switch to local dlopen symbols
|
1 rok temu |
dyn_ext_server.go
|
a64570dcae
Fix clearing kv cache between requests with the same prompt (#2186)
|
1 rok temu |
dyn_ext_server.h
|
39928a42e8
Always dynamically load the llm server library
|
1 rok temu |
ggml.go
|
eaed6f8c45
add max context length check
|
1 rok temu |
gguf.go
|
cd22855ef8
refactor tensor read
|
1 rok temu |
llama.go
|
4a33cede20
remove unused fields and functions
|
1 rok temu |
llm.go
|
4458efb73a
Load all layers on `arm64` macOS if model is small enough (#2149)
|
1 rok temu |
payload_common.go
|
dc88cc3981
use `gzip` for runner embedding (#2067)
|
1 rok temu |
payload_darwin_amd64.go
|
1b249748ab
Add multiple CPU variants for Intel Mac
|
1 rok temu |
payload_darwin_arm64.go
|
1b249748ab
Add multiple CPU variants for Intel Mac
|
1 rok temu |
payload_linux.go
|
1b249748ab
Add multiple CPU variants for Intel Mac
|
1 rok temu |
payload_test.go
|
7427fa1387
Fix up the CPU fallback selection
|
1 rok temu |
payload_windows.go
|
1b249748ab
Add multiple CPU variants for Intel Mac
|
1 rok temu |
utils.go
|
fccf8d179f
partial decode ggml bin for more info
|
1 rok temu |