Michael Yang
|
f4f939de28
Merge pull request #1552 from jmorganca/mxyng/lint-test
|
1 年之前 |
Daniel Hiltgen
|
39928a42e8
Always dynamically load the llm server library
|
1 年之前 |
Daniel Hiltgen
|
d88c527be3
Build multiple CPU variants and pick the best
|
1 年之前 |
Jeffrey Morgan
|
ab6be852c7
revisit memory allocation to account for full kv cache on main gpu
|
1 年之前 |
Daniel Hiltgen
|
8da7bef05f
Support multiple variants for a given llm lib type
|
1 年之前 |
Jeffrey Morgan
|
b24e8d17b2
Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896)
|
1 年之前 |
Michael Yang
|
f921e2696e
typo
|
1 年之前 |
Jeffrey Morgan
|
f387e9631b
use runner if cuda alloc won't fit
|
1 年之前 |
Jeffrey Morgan
|
cb534e6ac2
use 10% vram overhead for cuda
|
1 年之前 |
Jeffrey Morgan
|
58ce2d8273
better estimate scratch buffer size
|
1 年之前 |
Jeffrey Morgan
|
08f1e18965
Offload layers to GPU based on new model size estimates (#1850)
|
1 年之前 |
Daniel Hiltgen
|
e9ce91e9a6
Load dynamic cpu lib on windows
|
1 年之前 |
Jeffrey Morgan
|
c0285158a9
tweak memory requirements error text
|
1 年之前 |
Jeffrey Morgan
|
77a66df72c
add macOS memory check for 47B models
|
1 年之前 |
Jeffrey Morgan
|
5b4837f881
remove unused filetype check
|
1 年之前 |
Daniel Hiltgen
|
7555ea44f8
Revamp the dynamic library shim
|
1 年之前 |
Daniel Hiltgen
|
3269535a4c
Refine handling of shim presence
|
1 年之前 |
Daniel Hiltgen
|
35934b2e05
Adapted rocm support to cgo based llama.cpp
|
1 年之前 |
Daniel Hiltgen
|
d4cd695759
Add cgo implementation for llama.cpp
|
1 年之前 |
Bruce MacDonald
|
811b1f03c8
deprecate ggml
|
1 年之前 |
Michael Yang
|
b9495ea162
load projectors
|
1 年之前 |
Bruce MacDonald
|
195e3d9dbd
chat api endpoint (#1392)
|
1 年之前 |
Jeffrey Morgan
|
00d06619a1
Revert "chat api (#991)" while context variable is fixed
|
1 年之前 |
Bruce MacDonald
|
7a0899d62d
chat api (#991)
|
1 年之前 |
Michael Yang
|
19b7a4d715
recent llama.cpp update added kernels for fp32, q5_0, and q5_1
|
1 年之前 |
Jeffrey Morgan
|
5cba29b9d6
JSON mode: add `"format" as an api parameter (#1051)
|
1 年之前 |
Jeffrey Morgan
|
2e53704685
default rope params to 0 for new models (#968)
|
1 年之前 |
Jeffrey Morgan
|
7ed5a39bc7
simpler check for model loading compatibility errors
|
1 年之前 |
Jeffrey Morgan
|
a7dad24d92
add error for `falcon` and `starcoder` vocab compatibility (#844)
|
1 年之前 |
Michael Yang
|
36fe2deebf
only check system memory on macos
|
1 年之前 |