|
@@ -131,3 +131,37 @@ This can impact both installing Ollama, as well as downloading models.
|
|
Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernel (WSL)` adapter, right click and select `Properties`.
|
|
Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernel (WSL)` adapter, right click and select `Properties`.
|
|
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
|
|
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
|
|
properties.
|
|
properties.
|
|
|
|
+
|
|
|
|
+## How can I pre-load a model to get faster response times?
|
|
|
|
+
|
|
|
|
+If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
|
|
|
|
+
|
|
|
|
+To preload the mistral model using the generate endpoint, use:
|
|
|
|
+```shell
|
|
|
|
+curl http://localhost:11434/api/generate -d '{"model": "mistral"}'
|
|
|
|
+```
|
|
|
|
+
|
|
|
|
+To use the chat completions endpoint, use:
|
|
|
|
+```shell
|
|
|
|
+curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
|
|
|
|
+```
|
|
|
|
+
|
|
|
|
+## How do I keep a model loaded in memory or make it unload immediately?
|
|
|
|
+
|
|
|
|
+By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you are making numerous requests to the LLM. You may, however, want to free up the memory before the 5 minutes have elapsed or keep the model loaded indefinitely. Use the `keep_alive` parameter with either the `/api/generate` and `/api/chat` API endpoints to control how long the model is left in memory.
|
|
|
|
+
|
|
|
|
+The `keep_alive` parameter can be set to:
|
|
|
|
+* a duration string (such as "10m" or "24h")
|
|
|
|
+* a number in seconds (such as 3600)
|
|
|
|
+* any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
|
|
|
|
+* '0' which will unload the model immediately after generating a response
|
|
|
|
+
|
|
|
|
+For example, to preload a model and leave it in memory use:
|
|
|
|
+```shell
|
|
|
|
+curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": -1}'
|
|
|
|
+```
|
|
|
|
+
|
|
|
|
+To unload the model and free up memory use:
|
|
|
|
+```shell
|
|
|
|
+curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
|
|
|
|
+```
|