Michael Yang
|
05a01fdecb
ml/backend/ggml: consolidate system info logging
|
il y a 2 mois |
Jesse Gross
|
21aa666a1e
ml: Enable support for flash attention
|
il y a 2 mois |
Jesse Gross
|
ee141cc821
ml: Empty tensor constructor for tensors
|
il y a 2 mois |
Jesse Gross
|
854a9195f3
attention: Remove unnecessary contiguous operations
|
il y a 2 mois |
Michael Yang
|
3e8b8a1933
ml: update Context.Forward interface
|
il y a 2 mois |
Michael Yang
|
53d2990d9b
model: add bos token if configured
|
il y a 2 mois |
Jesse Gross
|
f53f4198c3
ml: Abstract attention out of model definitions
|
il y a 2 mois |
Jesse Gross
|
bd6a7d5e64
ollamarunner: Pass runner performance parameters to backends
|
il y a 2 mois |
Daniel Hiltgen
|
df2680b4b9
Wire up system info log for new engine (#9123)
|
il y a 2 mois |
Jesse Gross
|
ed443a0393
Runner for Ollama engine
|
il y a 4 mois |
Jesse Gross
|
d773b7d671
backend: API to support full precision matmul
|
il y a 2 mois |
Jesse Gross
|
4d4463b2bd
backend: Support graph computation that does not return an output
|
il y a 2 mois |
Jesse Gross
|
0e38297f87
backend: Consistently use int (vs. int64) for tensor shapes
|
il y a 2 mois |
Jesse Gross
|
7e13f568dc
backend: Don't return an error on Close
|
il y a 2 mois |
Michael Yang
|
58245413f4
next ollama runner (#7913)
|
il y a 2 mois |