소스 검색

sched: Lift parallel restriction for multimodal models except mllama

The Go runner does not have a problem with supporting parallel
requests for most multimodal models. Now that we won't be potentially
falling back to server.cpp, this restriction can be lifted.

However, the new mllama model can't support parallel requests, so we
will need to keep a restriction for that.
Jesse Gross 6 달 전
부모
커밋
6cd566872b
1개의 변경된 파일3개의 추가작업 그리고 3개의 파일을 삭제
  1. 3 3
      server/sched.go

+ 3 - 3
server/sched.go

@@ -130,11 +130,11 @@ func (s *Scheduler) processPending(ctx context.Context) {
 				continue
 			}
 			numParallel := int(envconfig.NumParallel())
-			// TODO (jmorganca): multimodal models don't support parallel yet
+			// TODO (jmorganca): mllama doesn't support parallel yet
 			// see https://github.com/ollama/ollama/issues/4165
-			if len(pending.model.ProjectorPaths) > 0 && numParallel != 1 {
+			if checkMllamaModelFamily(pending.model) && numParallel != 1 {
 				numParallel = 1
-				slog.Warn("multimodal models don't support parallel requests yet")
+				slog.Warn("mllama doesn't support parallel requests yet")
 			}
 
 			for {