커밋 기록

작성자 SHA1 메시지 날짜
  Blake Mizerany cb42e607c5 llm: speed up gguf decoding by a lot (#5246) 10 달 전
  Daniel Hiltgen 5bf5aeec01 Refine mmap default logic on linux 10 달 전
  Daniel Hiltgen 96624aa412 Merge pull request #5072 from dhiltgen/windows_path 10 달 전
  Daniel Hiltgen 7784ca33ce Tighten up memory prediction logging 10 달 전
  Daniel Hiltgen 171796791f Adjust mmap logic for cuda windows for faster model load 10 달 전
  Daniel Hiltgen b2799f111b Move libraries out of users path 10 달 전
  Daniel Hiltgen da3bf23354 Workaround gfx900 SDMA bugs 11 달 전
  Daniel Hiltgen 6f351bf586 review comments and coverage 11 달 전
  Daniel Hiltgen fc37c192ae Refine CPU load behavior with system memory visibility 11 달 전
  Daniel Hiltgen 6fd04ca922 Improve multi-gpu handling at the limit 11 달 전
  Craig Hughes b84aea1685 Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782) 10 달 전
  Michael Yang e40145a39d lint 11 달 전
  Michael Yang c895a7d13f some gocritic 11 달 전
  Michael Yang 829ff87bd1 revert tokenize ffi (#4761) 11 달 전
  Jeffrey Morgan a50a87a7b8 partial offloading: allow flash attention and disable mmap (#4734) 11 달 전
  Michael Yang 26a00a0410 use ffi for tokenizing/detokenizing 11 달 전
  Daniel Hiltgen 92c81e8117 Give the final model loading more time 11 달 전
  Lei Jitang 7487229c34 llm/server.go: Fix 2 minor typos (#4661) 11 달 전
  Daniel Hiltgen 0165ba1651 Merge pull request #4638 from dhiltgen/better_error 11 달 전
  Daniel Hiltgen c4209d6d21 Report better warning on client closed abort of load 11 달 전
  Patrick Devine 4cc3be3035 Move envconfig and consolidate env vars (#4608) 11 달 전
  Daniel Hiltgen b37b496a12 Wire up load progress 11 달 전
  Jeffrey Morgan 38255d2af1 Use flash attention flag for now (#4580) 11 달 전
  Sam e15307fdf4 feat: add support for flash_attn (#4120) 11 달 전
  Patrick Devine d1692fd3e0 fix the cpu estimatedTotal memory + get the expiry time for loading models (#4461) 11 달 전
  Daniel Hiltgen 853ae490e1 Sanitize the env var debug log 11 달 전
  Patrick Devine 6845988807 Ollama `ps` command for showing currently loaded models (#4327) 11 달 전
  jmorganca 92ca2cca95 Revert "only forward some env vars" 11 달 전
  Daniel Hiltgen c4014e73a2 Fall back to CPU runner with zero layers 11 달 전
  Jeffrey Morgan bb6fd02298 Don't clamp ctx size in `PredictServerFit` (#4317) 11 달 전