Bruce MacDonald
|
9771b1ec51
windows runner fixes (#637)
|
1 gadu atpakaļ |
Michael Yang
|
f40b3de758
use int64 consistently
|
1 gadu atpakaļ |
Bruce MacDonald
|
86279f4ae3
unbound max num gpu layers (#591)
|
1 gadu atpakaļ |
Bruce MacDonald
|
4cba75efc5
remove tmp directories created by previous servers (#559)
|
1 gadu atpakaļ |
Bruce MacDonald
|
1255bc9b45
only package 11.8 runner
|
1 gadu atpakaļ |
Bruce MacDonald
|
4e8be787c7
pack in cuda libs
|
1 gadu atpakaļ |
Bruce MacDonald
|
66003e1d05
subprocess improvements (#524)
|
1 gadu atpakaļ |
Bruce MacDonald
|
2540c9181c
support for packaging in multiple cuda runners (#509)
|
1 gadu atpakaļ |
Michael Yang
|
7dee25a07f
fix falcon decode
|
1 gadu atpakaļ |
Bruce MacDonald
|
f221637053
first pass at linux gpu support (#454)
|
1 gadu atpakaļ |
Bruce MacDonald
|
09dd2aeff9
GGUF support (#441)
|
1 gadu atpakaļ |
Bruce MacDonald
|
42998d797d
subprocess llama.cpp server (#401)
|
1 gadu atpakaļ |
Quinn Slack
|
f4432e1dba
treat stop as stop sequences, not exact tokens (#442)
|
1 gadu atpakaļ |
Michael Yang
|
5ca05c2e88
fix ModelType()
|
1 gadu atpakaļ |
Michael Yang
|
a894cc792d
model and file type as strings
|
1 gadu atpakaļ |
Bruce MacDonald
|
4b2d366c37
Update llama.go
|
1 gadu atpakaļ |
Bruce MacDonald
|
56fd4e4ef2
log embedding eval timing
|
1 gadu atpakaļ |
Jeffrey Morgan
|
22885aeaee
update `llama.cpp` to `f64d44a`
|
1 gadu atpakaļ |
Michael Yang
|
6de5d032e1
implement loading ggml lora adapters through the modelfile
|
1 gadu atpakaļ |
Michael Yang
|
fccf8d179f
partial decode ggml bin for more info
|
1 gadu atpakaļ |