Jeffrey Morgan
|
1deafd8254
llama: update vendored code to commit 46e3556 (#8308)
|
3 months ago |
Jesse Gross
|
08a832b482
llama: Ensure KV cache is fully defragmented.
|
4 months ago |
Jeffrey Morgan
|
527cc97899
llama: update vendored code to commit 40c6d79f (#7875)
|
4 months ago |
Daniel Hiltgen
|
4879a234c4
build: Make target improvements (#7499)
|
4 months ago |
Sam
|
1bdab9fdb1
llm: introduce k/v context quantization (vRAM improvements) (#6279)
|
5 months ago |
ItzCrazyKns
|
e3936d4fb3
Support Multiple LoRa Adapters (#7667)
|
5 months ago |
Jesse Gross
|
71e6a0d0d1
runner.go: Don't try to extract image tags for text models
|
5 months ago |
Jesse Gross
|
2cd11ae365
runner.go: Add unit tests for context shifting
|
5 months ago |
Jesse Gross
|
3478b2cf14
runner.go: Fix deadlock with many concurrent requests
|
5 months ago |
Daniel Hiltgen
|
b85520bfb9
logs: explain client aborts better (#7783)
|
5 months ago |
Jesse Gross
|
c4b34f2a2a
runner.go: Truncate inputs that exceed context rather than shifting
|
5 months ago |
Jesse Gross
|
c3ff916431
runner.go: Don't add inputs to cache view until actually processed
|
5 months ago |
Jesse Gross
|
3fc1dc0e6f
runner.go: Hard fail on errors rather than potentially infinite looping
|
5 months ago |
Jesse Gross
|
7121dfa309
runner.go: Retry decoding after defragmentation if needed
|
5 months ago |
Jesse Gross
|
5f68fcab12
runner.go: Use correct index when retrieving embedding results
|
5 months ago |
Jesse Gross
|
d875e99e46
runner.go: Propagate panics back to the user.
|
5 months ago |
Jesse Gross
|
8a35bb926e
runner.go: Increase survivability of main processing loop
|
5 months ago |
Jesse Gross
|
c25ffde91d
runner.go: Don't trim whitespace from inputs
|
5 months ago |
Jesse Gross
|
17b386a891
runner.go: Enforce NUM_PARALLEL directly in the runner
|
5 months ago |
Michael Yang
|
549c2bdfcf
Merge pull request #7657 from ollama/mxyng/sync
|
5 months ago |
Michael Yang
|
5b3393b6a2
fix(mllama): sync backend between batches
|
5 months ago |
Jesse Gross
|
d7eb05b936
runner.go: Fix off-by-one for num predicted
|
5 months ago |
Jesse Gross
|
65973ceb64
runner.go: Make KV entry accounting more robust
|
5 months ago |
Jesse Gross
|
a909417602
runner.go: Remove unused arguments
|
6 months ago |
Jesse Gross
|
312d9de1d1
llama: Improve error handling
|
6 months ago |
Jesse Gross
|
a103dae01e
runner.go: Only allocate 1 element embedding batches for mllama
|
6 months ago |
Jesse Gross
|
26acdcf44e
runner.go: Don't set cross attention before sending embeddings
|
6 months ago |
Jesse Gross
|
c826e57475
runner.go: Better abstract vision model integration
|
6 months ago |
Daniel Hiltgen
|
712e99d477
Soften windows clang requirement (#7428)
|
6 months ago |
Jesse Gross
|
de1557a0dc
runner.go: Better handle return NULL values from llama.cpp
|
6 months ago |