Jesse Gross
|
1feff61977
kvcache: Sliding window cache only needs a single batch total
|
1 ماه پیش |
Jesse Gross
|
2d6eac9084
kvcache: Optimize sliding window attention
|
1 ماه پیش |
Jesse Gross
|
3ed7ad3ab3
kvcache: Pass granular cache size into implementations
|
1 ماه پیش |
Jesse Gross
|
d3e9ca3eda
kvcache: Account for source tensors in defrag operation count
|
1 ماه پیش |
Jesse Gross
|
0c220935bd
input: Rename Options to Batch
|
1 ماه پیش |
Jesse Gross
|
a8e83a7654
Disable causal attention based on batch index
|
1 ماه پیش |
Michael Yang
|
e95278932b
use non-causal mask only for image positions
|
1 ماه پیش |
Jesse Gross
|
a1cda80bcb
model: Update encoder cache to use multimodal input processing handler
|
1 ماه پیش |
Jesse Gross
|
f52b2615ef
kvcache: Set context for shift offsets
|
1 ماه پیش |
Jesse Gross
|
6da8b6a879
kvcache: Support non-causal attention
|
1 ماه پیش |
Michael Yang
|
7bae7fa5ce
ml/backend/ggml: create tensor on specific backend
|
2 ماه پیش |
Michael Yang
|
764e199d67
kvcache: create cache ctx per layer
|
2 ماه پیش |
Jesse Gross
|
21aa666a1e
ml: Enable support for flash attention
|
2 ماه پیش |
Jesse Gross
|
854a9195f3
attention: Remove unnecessary contiguous operations
|
2 ماه پیش |
Michael Yang
|
3e8b8a1933
ml: update Context.Forward interface
|
2 ماه پیش |
Jesse Gross
|
ed443a0393
Runner for Ollama engine
|
4 ماه پیش |