Michael Yang
|
b732beba6a
lint
|
9 meses atrás |
Michael Yang
|
df993fa37b
comments
|
9 meses atrás |
Michael Yang
|
5e9db9fb0b
refactor convert
|
11 meses atrás |
Michael Yang
|
5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
|
9 meses atrás |
royjhan
|
1b44d873e7
Add Metrics to `api\embed` response (#5709)
|
9 meses atrás |
Daniel Hiltgen
|
345420998e
Prevent partial loading on mixed GPU brands
|
9 meses atrás |
Michael Yang
|
0f1910129f
int
|
10 meses atrás |
Jeffrey Morgan
|
80ee9b5e47
Remove out of space test temporarily (#5825)
|
9 meses atrás |
Daniel Hiltgen
|
06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
|
9 meses atrás |
royjhan
|
b9f5e16c80
Introduce `/api/embed` endpoint supporting batch embedding (#5127)
|
9 meses atrás |
Daniel Hiltgen
|
f4408219e9
Refine scheduler unit tests for reliability
|
10 meses atrás |
Daniel Hiltgen
|
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
|
10 meses atrás |
Daniel Hiltgen
|
955f2a4e03
Only set default keep_alive on initial model load
|
10 meses atrás |
Daniel Hiltgen
|
3c75113e37
Prevent loading models larger than total memory
|
10 meses atrás |
Daniel Hiltgen
|
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
|
10 meses atrás |
Blake Mizerany
|
cb42e607c5
llm: speed up gguf decoding by a lot (#5246)
|
10 meses atrás |
Daniel Hiltgen
|
17b7186cd7
Enable concurrency by default
|
1 ano atrás |
Daniel Hiltgen
|
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
|
10 meses atrás |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
11 meses atrás |
Daniel Hiltgen
|
fc37c192ae
Refine CPU load behavior with system memory visibility
|
11 meses atrás |
Daniel Hiltgen
|
6fd04ca922
Improve multi-gpu handling at the limit
|
11 meses atrás |
Jeffrey Morgan
|
dd7c9ebeaf
server: longer timeout in `TestRequests` (#5046)
|
10 meses atrás |
Michael Yang
|
e40145a39d
lint
|
11 meses atrás |
Patrick Devine
|
4cc3be3035
Move envconfig and consolidate env vars (#4608)
|
11 meses atrás |
Jeffrey Morgan
|
38255d2af1
Use flash attention flag for now (#4580)
|
11 meses atrás |
Patrick Devine
|
6845988807
Ollama `ps` command for showing currently loaded models (#4327)
|
11 meses atrás |
Daniel Hiltgen
|
0a954e5066
Fix stale test logic
|
1 ano atrás |
Jeffrey Morgan
|
dfa2f32ca0
unload in critical section (#4187)
|
1 ano atrás |
Daniel Hiltgen
|
f56aa20014
Centralize server config handling
|
1 ano atrás |
Daniel Hiltgen
|
9a32c514cb
Soften timeouts on sched unit tests
|
1 ano atrás |