royjhan
|
a5f23d766e
Merge branch 'main' into royh-batchembed
|
hai 10 meses |
Roy Han
|
00a4cb26ca
use float32
|
hai 10 meses |
Josh Yan
|
33a65e3ba3
error
|
hai 10 meses |
Roy Han
|
aee25acb5b
move normalization to go
|
hai 10 meses |
Daniel Hiltgen
|
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
|
hai 10 meses |
Roy Han
|
c111d8bb51
normalization
|
hai 10 meses |
Roy Han
|
49e341147d
add server function
|
hai 10 meses |
Roy Han
|
c406fa7a4c
api/embed draft
|
hai 10 meses |
Roy Han
|
ff191d7cba
Initial Draft
|
hai 10 meses |
Blake Mizerany
|
cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
|
hai 10 meses |
Roy Han
|
0f87628b6d
Revert "Initial Batch Embedding"
|
hai 10 meses |
Daniel Hiltgen
|
17b7186cd7
Enable concurrency by default
|
hai 1 ano |
Daniel Hiltgen
|
5bf5aeec01
Refine mmap default logic on linux
|
hai 10 meses |
Daniel Hiltgen
|
96624aa412
Merge pull request #5072 from dhiltgen/windows_path
|
hai 10 meses |
Roy Han
|
c22d54895a
Initial Batch Embedding
|
hai 10 meses |
Daniel Hiltgen
|
7784ca33ce
Tighten up memory prediction logging
|
hai 10 meses |
Daniel Hiltgen
|
171796791f
Adjust mmap logic for cuda windows for faster model load
|
hai 10 meses |
Daniel Hiltgen
|
b2799f111b
Move libraries out of users path
|
hai 10 meses |
Daniel Hiltgen
|
da3bf23354
Workaround gfx900 SDMA bugs
|
hai 11 meses |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
hai 11 meses |
Daniel Hiltgen
|
fc37c192ae
Refine CPU load behavior with system memory visibility
|
hai 11 meses |
Daniel Hiltgen
|
6fd04ca922
Improve multi-gpu handling at the limit
|
hai 11 meses |
Craig Hughes
|
b84aea1685
Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782)
|
hai 11 meses |
Michael Yang
|
e40145a39d
lint
|
hai 11 meses |
Michael Yang
|
c895a7d13f
some gocritic
|
hai 11 meses |
Michael Yang
|
829ff87bd1
revert tokenize ffi (#4761)
|
hai 11 meses |
Jeffrey Morgan
|
a50a87a7b8
partial offloading: allow flash attention and disable mmap (#4734)
|
hai 11 meses |
Michael Yang
|
26a00a0410
use ffi for tokenizing/detokenizing
|
hai 1 ano |
Daniel Hiltgen
|
92c81e8117
Give the final model loading more time
|
hai 11 meses |
Lei Jitang
|
7487229c34
llm/server.go: Fix 2 minor typos (#4661)
|
hai 11 meses |