Historique des commits

Auteur SHA1 Message Date
  Blake Mizerany cb42e607c5 llm: speed up gguf decoding by a lot (#5246) il y a 10 mois
  Daniel Hiltgen 5bf5aeec01 Refine mmap default logic on linux il y a 10 mois
  Daniel Hiltgen 96624aa412 Merge pull request #5072 from dhiltgen/windows_path il y a 10 mois
  Daniel Hiltgen 7784ca33ce Tighten up memory prediction logging il y a 10 mois
  Daniel Hiltgen 171796791f Adjust mmap logic for cuda windows for faster model load il y a 10 mois
  Daniel Hiltgen b2799f111b Move libraries out of users path il y a 10 mois
  Daniel Hiltgen da3bf23354 Workaround gfx900 SDMA bugs il y a 11 mois
  Daniel Hiltgen 6f351bf586 review comments and coverage il y a 11 mois
  Daniel Hiltgen fc37c192ae Refine CPU load behavior with system memory visibility il y a 11 mois
  Daniel Hiltgen 6fd04ca922 Improve multi-gpu handling at the limit il y a 11 mois
  Craig Hughes b84aea1685 Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782) il y a 11 mois
  Michael Yang e40145a39d lint il y a 11 mois
  Michael Yang c895a7d13f some gocritic il y a 11 mois
  Michael Yang 829ff87bd1 revert tokenize ffi (#4761) il y a 11 mois
  Jeffrey Morgan a50a87a7b8 partial offloading: allow flash attention and disable mmap (#4734) il y a 11 mois
  Michael Yang 26a00a0410 use ffi for tokenizing/detokenizing il y a 11 mois
  Daniel Hiltgen 92c81e8117 Give the final model loading more time il y a 11 mois
  Lei Jitang 7487229c34 llm/server.go: Fix 2 minor typos (#4661) il y a 11 mois
  Daniel Hiltgen 0165ba1651 Merge pull request #4638 from dhiltgen/better_error il y a 11 mois
  Daniel Hiltgen c4209d6d21 Report better warning on client closed abort of load il y a 11 mois
  Patrick Devine 4cc3be3035 Move envconfig and consolidate env vars (#4608) il y a 11 mois
  Daniel Hiltgen b37b496a12 Wire up load progress il y a 11 mois
  Jeffrey Morgan 38255d2af1 Use flash attention flag for now (#4580) il y a 11 mois
  Sam e15307fdf4 feat: add support for flash_attn (#4120) il y a 11 mois
  Patrick Devine d1692fd3e0 fix the cpu estimatedTotal memory + get the expiry time for loading models (#4461) il y a 11 mois
  Daniel Hiltgen 853ae490e1 Sanitize the env var debug log il y a 11 mois
  Patrick Devine 6845988807 Ollama `ps` command for showing currently loaded models (#4327) il y a 11 mois
  jmorganca 92ca2cca95 Revert "only forward some env vars" il y a 11 mois
  Daniel Hiltgen c4014e73a2 Fall back to CPU runner with zero layers il y a 11 mois
  Jeffrey Morgan bb6fd02298 Don't clamp ctx size in `PredictServerFit` (#4317) il y a 1 an