|
@@ -6,7 +6,7 @@ Ollama on macOS and Windows will automatically download updates. Click on the ta
|
|
|
|
|
|
On Linux, re-run the install script:
|
|
On Linux, re-run the install script:
|
|
|
|
|
|
-```
|
|
|
|
|
|
+```shell
|
|
curl -fsSL https://ollama.com/install.sh | sh
|
|
curl -fsSL https://ollama.com/install.sh | sh
|
|
```
|
|
```
|
|
|
|
|
|
@@ -30,7 +30,7 @@ To change this when using `ollama run`, use `/set parameter`:
|
|
|
|
|
|
When using the API, specify the `num_ctx` parameter:
|
|
When using the API, specify the `num_ctx` parameter:
|
|
|
|
|
|
-```
|
|
|
|
|
|
+```shell
|
|
curl http://localhost:11434/api/generate -d '{
|
|
curl http://localhost:11434/api/generate -d '{
|
|
"model": "llama3",
|
|
"model": "llama3",
|
|
"prompt": "Why is the sky blue?",
|
|
"prompt": "Why is the sky blue?",
|
|
@@ -40,6 +40,21 @@ curl http://localhost:11434/api/generate -d '{
|
|
}'
|
|
}'
|
|
```
|
|
```
|
|
|
|
|
|
|
|
+## How can I tell if my model was loaded onto the GPU?
|
|
|
|
+
|
|
|
|
+Use the `ollama ps` command to see what models are currently loaded into memory.
|
|
|
|
+
|
|
|
|
+```shell
|
|
|
|
+ollama ps
|
|
|
|
+NAME ID SIZE PROCESSOR UNTIL
|
|
|
|
+llama3:70b bcfb190ca3a7 42 GB 100% GPU 4 minutes from now
|
|
|
|
+```
|
|
|
|
+
|
|
|
|
+The `Processor` column will show which memory the model was loaded in to:
|
|
|
|
+* `100% GPU` means the model was loaded entirely into the GPU
|
|
|
|
+* `100% CPU` means the model was loaded entirely in system memory
|
|
|
|
+* `48%/52% CPU/GPU` means the model was loaded partially onto both the GPU and into system memory
|
|
|
|
+
|
|
## How do I configure Ollama server?
|
|
## How do I configure Ollama server?
|
|
|
|
|
|
Ollama server can be configured with environment variables.
|
|
Ollama server can be configured with environment variables.
|
|
@@ -94,6 +109,34 @@ On Windows, Ollama inherits your user and system environment variables.
|
|
|
|
|
|
6. Start the Ollama application from the Windows Start menu.
|
|
6. Start the Ollama application from the Windows Start menu.
|
|
|
|
|
|
|
|
+## How do I use Ollama behind a proxy?
|
|
|
|
+
|
|
|
|
+Ollama is compatible with proxy servers if `HTTP_PROXY` or `HTTPS_PROXY` are configured. When using either variables, ensure it is set where `ollama serve` can access the values. When using `HTTPS_PROXY`, ensure the proxy certificate is installed as a system certificate. Refer to the section above for how to use environment variables on your platform.
|
|
|
|
+
|
|
|
|
+### How do I use Ollama behind a proxy in Docker?
|
|
|
|
+
|
|
|
|
+The Ollama Docker container image can be configured to use a proxy by passing `-e HTTPS_PROXY=https://proxy.example.com` when starting the container.
|
|
|
|
+
|
|
|
|
+Alternatively, the Docker daemon can be configured to use a proxy. Instructions are available for Docker Desktop on [macOS](https://docs.docker.com/desktop/settings/mac/#proxies), [Windows](https://docs.docker.com/desktop/settings/windows/#proxies), and [Linux](https://docs.docker.com/desktop/settings/linux/#proxies), and Docker [daemon with systemd](https://docs.docker.com/config/daemon/systemd/#httphttps-proxy).
|
|
|
|
+
|
|
|
|
+Ensure the certificate is installed as a system certificate when using HTTPS. This may require a new Docker image when using a self-signed certificate.
|
|
|
|
+
|
|
|
|
+```dockerfile
|
|
|
|
+FROM ollama/ollama
|
|
|
|
+COPY my-ca.pem /usr/local/share/ca-certificates/my-ca.crt
|
|
|
|
+RUN update-ca-certificates
|
|
|
|
+```
|
|
|
|
+
|
|
|
|
+Build and run this image:
|
|
|
|
+
|
|
|
|
+```shell
|
|
|
|
+docker build -t ollama-with-ca .
|
|
|
|
+docker run -d -e HTTPS_PROXY=https://my.proxy.example.com -p 11434:11434 ollama-with-ca
|
|
|
|
+```
|
|
|
|
+
|
|
|
|
+## Does Ollama send my prompts and answers back to ollama.com?
|
|
|
|
+
|
|
|
|
+No. Ollama runs locally, and conversation data does not leave your machine.
|
|
|
|
|
|
## How can I expose Ollama on my network?
|
|
## How can I expose Ollama on my network?
|
|
|
|
|
|
@@ -120,7 +163,7 @@ server {
|
|
|
|
|
|
Ollama can be accessed using a range of tools for tunneling tools. For example with Ngrok:
|
|
Ollama can be accessed using a range of tools for tunneling tools. For example with Ngrok:
|
|
|
|
|
|
-```
|
|
|
|
|
|
+```shell
|
|
ngrok http 11434 --host-header="localhost:11434"
|
|
ngrok http 11434 --host-header="localhost:11434"
|
|
```
|
|
```
|
|
|
|
|
|
@@ -128,7 +171,7 @@ ngrok http 11434 --host-header="localhost:11434"
|
|
|
|
|
|
To use Ollama with Cloudflare Tunnel, use the `--url` and `--http-host-header` flags:
|
|
To use Ollama with Cloudflare Tunnel, use the `--url` and `--http-host-header` flags:
|
|
|
|
|
|
-```
|
|
|
|
|
|
+```shell
|
|
cloudflared tunnel --url http://localhost:11434 --http-host-header="localhost:11434"
|
|
cloudflared tunnel --url http://localhost:11434 --http-host-header="localhost:11434"
|
|
```
|
|
```
|
|
|
|
|
|
@@ -150,39 +193,10 @@ If a different directory needs to be used, set the environment variable `OLLAMA_
|
|
|
|
|
|
Refer to the section [above](#how-do-i-configure-ollama-server) for how to set environment variables on your platform.
|
|
Refer to the section [above](#how-do-i-configure-ollama-server) for how to set environment variables on your platform.
|
|
|
|
|
|
-## Does Ollama send my prompts and answers back to ollama.com?
|
|
|
|
-
|
|
|
|
-No. Ollama runs locally, and conversation data does not leave your machine.
|
|
|
|
-
|
|
|
|
## How can I use Ollama in Visual Studio Code?
|
|
## How can I use Ollama in Visual Studio Code?
|
|
|
|
|
|
There is already a large collection of plugins available for VSCode as well as other editors that leverage Ollama. See the list of [extensions & plugins](https://github.com/ollama/ollama#extensions--plugins) at the bottom of the main repository readme.
|
|
There is already a large collection of plugins available for VSCode as well as other editors that leverage Ollama. See the list of [extensions & plugins](https://github.com/ollama/ollama#extensions--plugins) at the bottom of the main repository readme.
|
|
|
|
|
|
-## How do I use Ollama behind a proxy?
|
|
|
|
-
|
|
|
|
-Ollama is compatible with proxy servers if `HTTP_PROXY` or `HTTPS_PROXY` are configured. When using either variables, ensure it is set where `ollama serve` can access the values. When using `HTTPS_PROXY`, ensure the proxy certificate is installed as a system certificate. Refer to the section above for how to use environment variables on your platform.
|
|
|
|
-
|
|
|
|
-### How do I use Ollama behind a proxy in Docker?
|
|
|
|
-
|
|
|
|
-The Ollama Docker container image can be configured to use a proxy by passing `-e HTTPS_PROXY=https://proxy.example.com` when starting the container.
|
|
|
|
-
|
|
|
|
-Alternatively, the Docker daemon can be configured to use a proxy. Instructions are available for Docker Desktop on [macOS](https://docs.docker.com/desktop/settings/mac/#proxies), [Windows](https://docs.docker.com/desktop/settings/windows/#proxies), and [Linux](https://docs.docker.com/desktop/settings/linux/#proxies), and Docker [daemon with systemd](https://docs.docker.com/config/daemon/systemd/#httphttps-proxy).
|
|
|
|
-
|
|
|
|
-Ensure the certificate is installed as a system certificate when using HTTPS. This may require a new Docker image when using a self-signed certificate.
|
|
|
|
-
|
|
|
|
-```dockerfile
|
|
|
|
-FROM ollama/ollama
|
|
|
|
-COPY my-ca.pem /usr/local/share/ca-certificates/my-ca.crt
|
|
|
|
-RUN update-ca-certificates
|
|
|
|
-```
|
|
|
|
-
|
|
|
|
-Build and run this image:
|
|
|
|
-
|
|
|
|
-```shell
|
|
|
|
-docker build -t ollama-with-ca .
|
|
|
|
-docker run -d -e HTTPS_PROXY=https://my.proxy.example.com -p 11434:11434 ollama-with-ca
|
|
|
|
-```
|
|
|
|
-
|
|
|
|
## How do I use Ollama with GPU acceleration in Docker?
|
|
## How do I use Ollama with GPU acceleration in Docker?
|
|
|
|
|
|
The Ollama Docker container can be configured with GPU acceleration in Linux or Windows (with WSL2). This requires the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit). See [ollama/ollama](https://hub.docker.com/r/ollama/ollama) for more details.
|
|
The Ollama Docker container can be configured with GPU acceleration in Linux or Windows (with WSL2). This requires the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit). See [ollama/ollama](https://hub.docker.com/r/ollama/ollama) for more details.
|
|
@@ -197,7 +211,7 @@ Open `Control Panel > Networking and Internet > View network status and tasks` a
|
|
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
|
|
Click on `Configure` and open the `Advanced` tab. Search through each of the properties until you find `Large Send Offload Version 2 (IPv4)` and `Large Send Offload Version 2 (IPv6)`. *Disable* both of these
|
|
properties.
|
|
properties.
|
|
|
|
|
|
-## How can I pre-load a model to get faster response times?
|
|
|
|
|
|
+## How can I preload a model into Ollama to get faster response times?
|
|
|
|
|
|
If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
|
|
If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
|
|
|
|
|
|
@@ -211,6 +225,11 @@ To use the chat completions endpoint, use:
|
|
curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
|
|
curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
|
|
```
|
|
```
|
|
|
|
|
|
|
|
+To preload a model using the CLI, use the command:
|
|
|
|
+```shell
|
|
|
|
+ollama run llama3 ""
|
|
|
|
+```
|
|
|
|
+
|
|
## How do I keep a model loaded in memory or make it unload immediately?
|
|
## How do I keep a model loaded in memory or make it unload immediately?
|
|
|
|
|
|
By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you are making numerous requests to the LLM. You may, however, want to free up the memory before the 5 minutes have elapsed or keep the model loaded indefinitely. Use the `keep_alive` parameter with either the `/api/generate` and `/api/chat` API endpoints to control how long the model is left in memory.
|
|
By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you are making numerous requests to the LLM. You may, however, want to free up the memory before the 5 minutes have elapsed or keep the model loaded indefinitely. Use the `keep_alive` parameter with either the `/api/generate` and `/api/chat` API endpoints to control how long the model is left in memory.
|
|
@@ -235,8 +254,6 @@ Alternatively, you can change the amount of time all models are loaded into memo
|
|
|
|
|
|
If you wish to override the `OLLAMA_KEEP_ALIVE` setting, use the `keep_alive` API parameter with the `/api/generate` or `/api/chat` API endpoints.
|
|
If you wish to override the `OLLAMA_KEEP_ALIVE` setting, use the `keep_alive` API parameter with the `/api/generate` or `/api/chat` API endpoints.
|
|
|
|
|
|
-## How do I manage the maximum number of requests the server can queue
|
|
|
|
|
|
+## How do I manage the maximum number of requests the Ollama server can queue?
|
|
|
|
|
|
-If too many requests are sent to the server, it will respond with a 503 error
|
|
|
|
-indicating the server is overloaded. You can adjust how many requests may be
|
|
|
|
-queue by setting `OLLAMA_MAX_QUEUE`
|
|
|
|
|
|
+If too many requests are sent to the server, it will respond with a 503 error indicating the server is overloaded. You can adjust how many requests may be queue by setting `OLLAMA_MAX_QUEUE`.
|