hace 1 año · 9a7a4b9533
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -131,3 +131,37 @@ This can impact both installing Ollama, as well as downloading models.
 
															 Open `Control Panel > Networking and Internet > View network status and tasks` and click on `Change adapter settings` on the left panel. Find the `vEthernel (WSL)` adapter, right click and select `Properties`.
														
 
															 Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
														
 
															 properties.
														
 
															+
														
 
															+## How can I pre-load a model to get faster response times?
														
 
															+
														
 
															+If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
														
 
															+
														
 
															+To preload the mistral model using the generate endpoint, use:
														
 
															+```shell
														
 
															+curl http://localhost:11434/api/generate -d '{"model": "mistral"}'
														
 
															+```
														
 
															+
														
 
															+To use the chat completions endpoint, use:
														
 
															+```shell
														
 
															+curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
														
 
															+```
														
 
															+
														
 
															+## How do I keep a model loaded in memory or make it unload immediately?
														
 
															+
														
 
															+By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you are making numerous requests to the LLM. You may, however, want to free up the memory before the 5 minutes have elapsed or keep the model loaded indefinitely. Use the `keep_alive` parameter with either the `/api/generate` and `/api/chat` API endpoints to control how long the model is left in memory.
														
 
															+
														
 
															+The `keep_alive` parameter can be set to:
														
 
															+* a duration string (such as "10m" or "24h")
														
 
															+* a number in seconds (such as 3600)
														
 
															+* any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
														
 
															+* '0' which will unload the model immediately after generating a response
														
 
															+
														
 
															+For example, to preload a model and leave it in memory use:
														
 
															+```shell
														
 
															+curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": -1}'
														
 
															+```
														
 
															+
														
 
															+To unload the model and free up memory use:
														
 
															+```shell
														
 
															+curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
														
 
															+```