Histórico de Commits

Autor SHA1 Mensagem Data
  frob 63269668c0 Prevent underflow when FreeMemory < overhead (#8014) há 4 meses atrás
  Sam 539be43640 llm: normalise kvct parameter handling (#7926) há 5 meses atrás
  Sam 1bdab9fdb1 llm: introduce k/v context quantization (vRAM improvements) (#6279) há 5 meses atrás
  Michael Yang d07cf41a97 refactor kv estimation há 6 meses atrás
  Patrick Devine c7cb0f0602 image processing for llama3.2 (#6963) há 6 meses atrás
  Daniel Hiltgen 05cd82ef94 Rename gpu package discover (#7143) há 6 meses atrás
  Daniel Hiltgen 56318fb365 Improve logging on GPU too small (#6666) há 7 meses atrás
  Daniel Hiltgen b05c9e83d9 Introduce GPU Overhead env var (#5922) há 7 meses atrás
  Michael Yang 8e0641a9bf handle asymmetric embedding KVs há 10 meses atrás
  Daniel Hiltgen 359b15a597 Handle models with divergent layer sizes há 10 meses atrás
  Daniel Hiltgen 7784ca33ce Tighten up memory prediction logging há 10 meses atrás
  Daniel Hiltgen 17df6520c8 Remove mmap related output calc logic há 10 meses atrás
  Daniel Hiltgen 6f351bf586 review comments and coverage há 11 meses atrás
  Daniel Hiltgen 6fd04ca922 Improve multi-gpu handling at the limit há 11 meses atrás
  Michael Yang 6297f85606 gofmt, goimports há 11 meses atrás
  Michael Yang e40145a39d lint há 11 meses atrás
  Patrick Devine 4cc3be3035 Move envconfig and consolidate env vars (#4608) há 11 meses atrás
  Michael Yang 1d359e737e typo há 11 meses atrás
  Michael Yang 50b9056e09 count memory up to NumGPU há 11 meses atrás
  Jeffrey Morgan bb6fd02298 Don't clamp ctx size in `PredictServerFit` (#4317) há 11 meses atrás
  Daniel Hiltgen bee2f4a3b0 Record GPU usage information há 1 ano atrás
  Michael Yang 4736391bfb llm: add minimum based on layer size há 1 ano atrás
  Daniel Hiltgen f56aa20014 Centralize server config handling há 1 ano atrás
  Jeffrey Morgan f0c454ab57 gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068) há 1 ano atrás
  Michael Yang f81f308118 fix gemma, command-r layer weights há 1 ano atrás
  Michael Yang 7bb7cb8a60 only count output tensors há 1 ano atrás
  Daniel Hiltgen 5445aaa94e Add back memory escape valve há 1 ano atrás
  Daniel Hiltgen 34b9db5afc Request and model concurrency há 1 ano atrás