Jesse Gross
|
854a9195f3
attention: Remove unnecessary contiguous operations
|
2 bulan lalu |
Michael Yang
|
3e8b8a1933
ml: update Context.Forward interface
|
2 bulan lalu |
Jesse Gross
|
f53f4198c3
ml: Abstract attention out of model definitions
|
2 bulan lalu |
Michael Yang
|
2192a28eed
ml/backend/ggml: fix rms norm
|
2 bulan lalu |
Jesse Gross
|
e5bcc51ae1
ggml-backend: Don't recreate the scheduler for each context
|
2 bulan lalu |
Jesse Gross
|
bd6a7d5e64
ollamarunner: Pass runner performance parameters to backends
|
2 bulan lalu |
Daniel Hiltgen
|
df2680b4b9
Wire up system info log for new engine (#9123)
|
2 bulan lalu |
Jesse Gross
|
ed443a0393
Runner for Ollama engine
|
4 bulan lalu |
Jesse Gross
|
d223f3b697
ggml-backend: Close on nil should be a no-op
|
2 bulan lalu |
Jesse Gross
|
60830695c2
ggml-backend: Ensure data is available after async computation
|
2 bulan lalu |
Jesse Gross
|
01d9a46854
ggml-backend: Let GGML allocate context memory
|
3 bulan lalu |
Jesse Gross
|
d773b7d671
backend: API to support full precision matmul
|
2 bulan lalu |
Jesse Gross
|
4d4463b2bd
backend: Support graph computation that does not return an output
|
2 bulan lalu |
Jesse Gross
|
0e38297f87
backend: Consistently use int (vs. int64) for tensor shapes
|
2 bulan lalu |
Jesse Gross
|
7e13f568dc
backend: Don't return an error on Close
|
2 bulan lalu |
Michael Yang
|
58245413f4
next ollama runner (#7913)
|
2 bulan lalu |