Blake Mizerany
|
9039c821a2
llama: preserve field order in user-defined JSON schemas (#8002)
|
4 months ago |
Jeffrey Morgan
|
527cc97899
llama: update vendored code to commit 40c6d79f (#7875)
|
4 months ago |
Daniel Hiltgen
|
4879a234c4
build: Make target improvements (#7499)
|
4 months ago |
Parth Sareen
|
de52b6c2f9
bugfix: "null" value json mode (#7979)
|
4 months ago |
Parth Sareen
|
630e7dc6ff
api: structured outputs - chat endpoint (#7900)
|
4 months ago |
Sam
|
539be43640
llm: normalise kvct parameter handling (#7926)
|
5 months ago |
Sam
|
1bdab9fdb1
llm: introduce k/v context quantization (vRAM improvements) (#6279)
|
5 months ago |
ItzCrazyKns
|
e3936d4fb3
Support Multiple LoRa Adapters (#7667)
|
5 months ago |
Daniel Hiltgen
|
b85520bfb9
logs: explain client aborts better (#7783)
|
5 months ago |
Daniel Hiltgen
|
909a88c5c0
Improve crash reporting (#7728)
|
5 months ago |
Daniel Hiltgen
|
81d55d3e4d
fix index out of range on zero layer metal load (#7696)
|
5 months ago |
Daniel Hiltgen
|
df011054fa
Jetpack support for Go server (#7217)
|
5 months ago |
Jesse Gross
|
a909417602
runner.go: Remove unused arguments
|
6 months ago |
Jesse Gross
|
de1557a0dc
runner.go: Better handle return NULL values from llama.cpp
|
6 months ago |
Patrick Devine
|
c7cb0f0602
image processing for llama3.2 (#6963)
|
6 months ago |
Gabe Goodhart
|
f2890a4494
IBM granite/granitemoe architecture support (#6760)
|
6 months ago |
Daniel Hiltgen
|
05cd82ef94
Rename gpu package discover (#7143)
|
6 months ago |
Daniel Hiltgen
|
24636dfa87
Discovery CPU details for default thread selection (#6264)
|
6 months ago |
Jesse Gross
|
03408f3437
server: Don't clear cmd when closing a server
|
6 months ago |
Jeffrey Morgan
|
96efd9052f
Re-introduce the `llama` package (#5034)
|
6 months ago |
Daniel Hiltgen
|
cd5c8f6471
Optimize container images for startup (#6547)
|
7 months ago |
Daniel Hiltgen
|
4a8069f9c4
Quiet down dockers new lint warnings (#6716)
|
7 months ago |
Daniel Hiltgen
|
6719097649
llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
|
8 months ago |
Daniel Hiltgen
|
037a4d103e
Log system memory at info (#6617)
|
8 months ago |
Sean Khatiri
|
397cae7962
llm: fix typo in comment (#6530)
|
8 months ago |
Daniel Hiltgen
|
0f92b19bec
Only enable numa on CPUs (#6484)
|
8 months ago |
Daniel Hiltgen
|
74d45f0102
Refactor linux packaging
|
9 months ago |
Jeffrey Morgan
|
15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
|
8 months ago |
Daniel Hiltgen
|
25906d72d1
llm: prevent loading too large models on windows (#5926)
|
8 months ago |
Jeffrey Morgan
|
de4fc29773
llm: reserve required number of slots for embeddings (#6219)
|
8 months ago |