jmorganca
|
f443dd7b81
llama: sync llama.cpp to commit 8962422
|
8 månader sedan |
Jesse Gross
|
8db94469e0
runner.go: Support GGUF LoRAs
|
8 månader sedan |
Jesse Gross
|
c989321509
runner.go: Don't cast a Go handle to a C void *
|
8 månader sedan |
Jesse Gross
|
e4a091bafd
runner.go: Support resource usage command line options
|
8 månader sedan |
Jeffrey Morgan
|
fd4ecd1ff5
llama: fix sync script ggml-metal_darwin_arm64.m filename (#6610)
|
8 månader sedan |
Jeffrey Morgan
|
9d8129b8bb
llama: delete unused files (#6523)
|
8 månader sedan |
Jesse Gross
|
c8a1741d9b
runner.go: Update TODOs
|
8 månader sedan |
Jesse Gross
|
46a7c682f2
runner.go: Fix embeddings endpoint
|
8 månader sedan |
Jesse Gross
|
52e88ab7b3
runner.go: Health endpoint comments
|
8 månader sedan |
Jesse Gross
|
4ca8579428
runner.go: Cleanups
|
8 månader sedan |
Jesse Gross
|
d022cfc9e6
runner.go: Move pieces[] into sequence
|
8 månader sedan |
Jesse Gross
|
6ccd0644e1
runner.go: Fix deadlock if a connection is closed during decoding
|
8 månader sedan |
Jesse Gross
|
0b73cca386
runner.go: Fix resource leaks when removing sequences
|
8 månader sedan |
Jesse Gross
|
55fb0633db
runner.go: Separate KV cache and context sizes
|
8 månader sedan |
Jesse Gross
|
53b600921e
runner.go: Hold mutex for entire time when processing batch
|
8 månader sedan |
Jesse Gross
|
8e1554c91d
runner.go: Scale batches to be processed by numParallel
|
8 månader sedan |
Daniel Hiltgen
|
f52d4b9879
Make new tokenizer logic conditional (#6395)
|
8 månader sedan |
Jesse Gross
|
76718ead40
runner.go: Support MinP parameter
|
8 månader sedan |
Jesse Gross
|
90d25d3b0a
runner.go: Check for incomplete UTF-8 character
|
8 månader sedan |
Jesse Gross
|
477f529d26
runner.go: Implement RepeatLastN to penalize repeated tokens
|
8 månader sedan |
Jesse Gross
|
eccd4dd8d2
runner.go: Use correct JSON field names for runners
|
8 månader sedan |
Jesse Gross
|
69cc5795a7
runner.go: Shift context window when KV cache space is exceeded
|
8 månader sedan |
Jesse Gross
|
5a441d227a
runner.go: Don't decode if nothing has been added to the batch
|
8 månader sedan |
Jesse Gross
|
8aa97b5e83
llama.go: Advance though tokens when processing multiple batches
|
8 månader sedan |
Jesse Gross
|
523d84c563
llama.go: Use dynamic buffer for TokenToPiece
|
8 månader sedan |
Jesse Gross
|
ed19fad862
llama.go: Make batch memory allocation match configuration
|
8 månader sedan |
Jesse Gross
|
5d34320b7c
runner.go: Fix off by one in batch size check
|
8 månader sedan |
Jesse Gross
|
1c36f36c41
llm: Fix array out-of-bounds memory access when tokenizing
|
8 månader sedan |
Jesse Gross
|
0c2f95f3de
runner: Initialize numPredict
|
8 månader sedan |
Jesse Gross
|
ebdf781397
server: Fix double free on runner subprocess error.
|
8 månader sedan |