커밋 기록

작성자 SHA1 메시지 날짜
  Jeffrey Morgan c4cf8ad559 llm: avoid loading model if system memory is too small (#5637) 9 달 전
  Jeffrey Morgan 791650ddef sched: only error when over-allocating system memory (#5626) 9 달 전
  Daniel Hiltgen 22c81f62ec Remove duplicate merge glitch 9 달 전
  Michael Yang 9bbddc37a7 Merge pull request #5126 from ollama/mxyng/messages 9 달 전
  Jeffrey Morgan 53da2c6965 llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation (#5535) 10 달 전
  Michael Yang ac7a842e55 fix model reloading 10 달 전
  Daniel Hiltgen ccd7785859 Merge pull request #5243 from dhiltgen/modelfile_use_mmap 10 달 전
  Daniel Hiltgen 0e982bc1f4 Fix corner cases on tmp cleaner on mac 10 달 전
  Josh Yan 33a65e3ba3 error 10 달 전
  Daniel Hiltgen 97c9e11768 Switch use_mmap to a pointer type 10 달 전
  Daniel Hiltgen 3518aaef33 Merge pull request #4218 from dhiltgen/auto_parallel 10 달 전
  Blake Mizerany cb42e607c5 llm: speed up gguf decoding by a lot (#5246) 10 달 전
  Daniel Hiltgen 17b7186cd7 Enable concurrency by default 1 년 전
  Daniel Hiltgen 5bf5aeec01 Refine mmap default logic on linux 10 달 전
  Daniel Hiltgen 96624aa412 Merge pull request #5072 from dhiltgen/windows_path 10 달 전
  Daniel Hiltgen 7784ca33ce Tighten up memory prediction logging 10 달 전
  Daniel Hiltgen 171796791f Adjust mmap logic for cuda windows for faster model load 10 달 전
  Daniel Hiltgen b2799f111b Move libraries out of users path 10 달 전
  Daniel Hiltgen da3bf23354 Workaround gfx900 SDMA bugs 11 달 전
  Daniel Hiltgen 6f351bf586 review comments and coverage 11 달 전
  Daniel Hiltgen fc37c192ae Refine CPU load behavior with system memory visibility 11 달 전
  Daniel Hiltgen 6fd04ca922 Improve multi-gpu handling at the limit 11 달 전
  Craig Hughes b84aea1685 Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782) 10 달 전
  Michael Yang e40145a39d lint 11 달 전
  Michael Yang c895a7d13f some gocritic 11 달 전
  Michael Yang 829ff87bd1 revert tokenize ffi (#4761) 11 달 전
  Jeffrey Morgan a50a87a7b8 partial offloading: allow flash attention and disable mmap (#4734) 11 달 전
  Michael Yang 26a00a0410 use ffi for tokenizing/detokenizing 11 달 전
  Daniel Hiltgen 92c81e8117 Give the final model loading more time 11 달 전
  Lei Jitang 7487229c34 llm/server.go: Fix 2 minor typos (#4661) 11 달 전