Roy Han
|
eb7cc2d1ce
image embeddings
|
9 months ago |
royjhan
|
b7c622dd32
Merge branch 'main' into royh-batchembed
|
9 months ago |
Daniel Hiltgen
|
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
|
10 months ago |
Daniel Hiltgen
|
955f2a4e03
Only set default keep_alive on initial model load
|
10 months ago |
Daniel Hiltgen
|
3c75113e37
Prevent loading models larger than total memory
|
10 months ago |
Roy Han
|
6caac01494
clear comments
|
10 months ago |
Roy Han
|
17de2b4405
Refactoring of legacy and new
|
10 months ago |
royjhan
|
a5f23d766e
Merge branch 'main' into royh-batchembed
|
10 months ago |
Roy Han
|
00a4cb26ca
use float32
|
10 months ago |
Daniel Hiltgen
|
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
|
10 months ago |
Roy Han
|
49e341147d
add server function
|
10 months ago |
Roy Han
|
c406fa7a4c
api/embed draft
|
10 months ago |
Roy Han
|
ff191d7cba
Initial Draft
|
10 months ago |
Blake Mizerany
|
cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
|
10 months ago |
Roy Han
|
0f87628b6d
Revert "Initial Batch Embedding"
|
10 months ago |
Daniel Hiltgen
|
17b7186cd7
Enable concurrency by default
|
1 year ago |
Roy Han
|
c22d54895a
Initial Batch Embedding
|
10 months ago |
Daniel Hiltgen
|
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
|
10 months ago |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
11 months ago |
Daniel Hiltgen
|
fc37c192ae
Refine CPU load behavior with system memory visibility
|
11 months ago |
Daniel Hiltgen
|
6fd04ca922
Improve multi-gpu handling at the limit
|
11 months ago |
Jeffrey Morgan
|
dd7c9ebeaf
server: longer timeout in `TestRequests` (#5046)
|
10 months ago |
Michael Yang
|
e40145a39d
lint
|
11 months ago |
Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
11 months ago |
Jeffrey Morgan
|
38255d2af1
Use flash attention flag for now (#4580)
|
11 months ago |
Patrick Devine
|
6845988807
Ollama `ps` command for showing currently loaded models (#4327)
|
11 months ago |
Daniel Hiltgen
|
0a954e5066
Fix stale test logic
|
1 year ago |
Jeffrey Morgan
|
dfa2f32ca0
unload in critical section (#4187)
|
1 year ago |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
1 year ago |
Daniel Hiltgen
|
9a32c514cb
Soften timeouts on sched unit tests
|
1 year ago |