Michael Yang
|
19b7a4d715
recent llama.cpp update added kernels for fp32, q5_0, and q5_1
|
1 year ago |
Jeffrey Morgan
|
5cba29b9d6
JSON mode: add `"format" as an api parameter (#1051)
|
1 year ago |
Jeffrey Morgan
|
2e53704685
default rope params to 0 for new models (#968)
|
1 year ago |
Jeffrey Morgan
|
7ed5a39bc7
simpler check for model loading compatibility errors
|
1 year ago |
Jeffrey Morgan
|
a7dad24d92
add error for `falcon` and `starcoder` vocab compatibility (#844)
|
1 year ago |
Michael Yang
|
36fe2deebf
only check system memory on macos
|
1 year ago |
Michael Yang
|
4a8931f634
check total (system + video) memory
|
1 year ago |
Michael Yang
|
bd6e38fb1a
refactor memory check
|
1 year ago |
Michael Yang
|
92189a5855
fix memory check
|
1 year ago |
Michael Yang
|
b599946b74
add format bytes
|
1 year ago |
Bruce MacDonald
|
d06bc0cb6e
enable q8, q5, 5_1, and f32 for linux gpu (#699)
|
1 year ago |
Bruce MacDonald
|
86279f4ae3
unbound max num gpu layers (#591)
|
1 year ago |
Bruce MacDonald
|
4cba75efc5
remove tmp directories created by previous servers (#559)
|
1 year ago |
Michael Yang
|
7dee25a07f
fix falcon decode
|
1 year ago |
Bruce MacDonald
|
09dd2aeff9
GGUF support (#441)
|
1 year ago |
Bruce MacDonald
|
42998d797d
subprocess llama.cpp server (#401)
|
1 year ago |
Michael Yang
|
b25dd1795d
allow F16 to use metal
|
1 year ago |
Michael Yang
|
304f2b6c96
add 34b to mem check
|
1 year ago |
Michael Yang
|
a894cc792d
model and file type as strings
|
1 year ago |
Michael Yang
|
e26085b921
close open files
|
1 year ago |
Michael Yang
|
6de5d032e1
implement loading ggml lora adapters through the modelfile
|
1 year ago |
Michael Yang
|
d791df75dd
check memory requirements before loading
|
1 year ago |
Michael Yang
|
020a3b3530
disable gpu for q5_0, q5_1, q8_0 quants
|
1 year ago |
Michael Yang
|
fccf8d179f
partial decode ggml bin for more info
|
1 year ago |