royjhan
|
b7c622dd32
Merge branch 'main' into royh-batchembed
|
пре 10 месеци |
Daniel Hiltgen
|
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
|
пре 10 месеци |
Daniel Hiltgen
|
955f2a4e03
Only set default keep_alive on initial model load
|
пре 10 месеци |
Daniel Hiltgen
|
3c75113e37
Prevent loading models larger than total memory
|
пре 10 месеци |
Roy Han
|
6caac01494
clear comments
|
пре 10 месеци |
Roy Han
|
17de2b4405
Refactoring of legacy and new
|
пре 10 месеци |
royjhan
|
a5f23d766e
Merge branch 'main' into royh-batchembed
|
пре 10 месеци |
Roy Han
|
00a4cb26ca
use float32
|
пре 10 месеци |
Daniel Hiltgen
|
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
|
пре 10 месеци |
Roy Han
|
49e341147d
add server function
|
пре 10 месеци |
Roy Han
|
c406fa7a4c
api/embed draft
|
пре 10 месеци |
Roy Han
|
ff191d7cba
Initial Draft
|
пре 10 месеци |
Blake Mizerany
|
cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
|
пре 10 месеци |
Roy Han
|
0f87628b6d
Revert "Initial Batch Embedding"
|
пре 10 месеци |
Daniel Hiltgen
|
17b7186cd7
Enable concurrency by default
|
пре 1 година |
Roy Han
|
c22d54895a
Initial Batch Embedding
|
пре 10 месеци |
Daniel Hiltgen
|
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
|
пре 10 месеци |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
пре 11 месеци |
Daniel Hiltgen
|
fc37c192ae
Refine CPU load behavior with system memory visibility
|
пре 11 месеци |
Daniel Hiltgen
|
6fd04ca922
Improve multi-gpu handling at the limit
|
пре 11 месеци |
Jeffrey Morgan
|
dd7c9ebeaf
server: longer timeout in `TestRequests` (#5046)
|
пре 10 месеци |
Michael Yang
|
e40145a39d
lint
|
пре 11 месеци |
Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
пре 11 месеци |
Jeffrey Morgan
|
38255d2af1
Use flash attention flag for now (#4580)
|
пре 11 месеци |
Patrick Devine
|
6845988807
Ollama `ps` command for showing currently loaded models (#4327)
|
пре 1 година |
Daniel Hiltgen
|
0a954e5066
Fix stale test logic
|
пре 1 година |
Jeffrey Morgan
|
dfa2f32ca0
unload in critical section (#4187)
|
пре 1 година |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
пре 1 година |
Daniel Hiltgen
|
9a32c514cb
Soften timeouts on sched unit tests
|
пре 1 година |
Daniel Hiltgen
|
d6e3b64582
Fix concurrency for CPU mode
|
пре 1 година |