|
@@ -1,41 +1,67 @@
|
|
|
+
|
|
|
+
|
|
|
# Ollama
|
|
|
|
|
|
-An easy, fast runtime for large language models, powered by `llama.cpp`.
|
|
|
+Run large language models with `llama.cpp`.
|
|
|
+
|
|
|
+> Note: certain models that can be run with this project are intended for research and/or non-commercial use only.
|
|
|
+
|
|
|
+### Features
|
|
|
|
|
|
-> _Note: this project is a work in progress. Certain models that can be run with `ollama` are intended for research and/or non-commercial use only._
|
|
|
+- Download and run popular large language models
|
|
|
+- Switch between multiple models on the fly
|
|
|
+- Hardware acceleration where available (Metal, CUDA)
|
|
|
+- Fast inference server written in C++, powered by [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
|
|
+- REST API to use with your application (python, typescript SDKs coming soon)
|
|
|
|
|
|
## Install
|
|
|
|
|
|
-Using `pip`:
|
|
|
+- Download for macOS
|
|
|
+- Download for Windows (coming soon)
|
|
|
+- Docker: `docker run -p 8080:8080 ollama/ollama`
|
|
|
+
|
|
|
+You can also build the [binary from source](#building).
|
|
|
+
|
|
|
+## Quickstart
|
|
|
+
|
|
|
+Run the model that started it all.
|
|
|
|
|
|
```
|
|
|
-pip install ollama
|
|
|
+ollama run llama
|
|
|
```
|
|
|
|
|
|
-Using `docker`:
|
|
|
+## Example models
|
|
|
+
|
|
|
+### 💬 Chat
|
|
|
+
|
|
|
+Have a conversation.
|
|
|
|
|
|
```
|
|
|
-docker run ollama/ollama
|
|
|
+ollama run vicuna "Why is the sky blue?"
|
|
|
```
|
|
|
|
|
|
-## Quickstart
|
|
|
+### 🗺️ Instructions
|
|
|
|
|
|
-To run a model, use `ollama run`:
|
|
|
+Ask questions. Get answers.
|
|
|
|
|
|
```
|
|
|
-ollama run orca-mini-3b
|
|
|
+ollama run orca "Write an email to my boss."
|
|
|
```
|
|
|
|
|
|
-You can also run models from hugging face:
|
|
|
+### 👩💻 Code completion
|
|
|
+
|
|
|
+Sometimes you just need a little help writing code.
|
|
|
|
|
|
```
|
|
|
-ollama run huggingface.co/TheBloke/orca_mini_3B-GGML
|
|
|
+ollama run replit "Give me react code to render a button"
|
|
|
```
|
|
|
|
|
|
-Or directly via downloaded model files:
|
|
|
+### 📖 Storytelling
|
|
|
+
|
|
|
+Venture into the unknown.
|
|
|
|
|
|
```
|
|
|
-ollama run ~/Downloads/orca-mini-13b.ggmlv3.q4_0.bin
|
|
|
+ollama run storyteller "Once upon a time"
|
|
|
```
|
|
|
|
|
|
## Building
|
|
@@ -45,7 +71,24 @@ go generate ./...
|
|
|
go build .
|
|
|
```
|
|
|
|
|
|
-## Documentation
|
|
|
+To run it start the server:
|
|
|
|
|
|
-- [Development](docs/development.md)
|
|
|
-- [Python SDK](docs/python.md)
|
|
|
+```
|
|
|
+./ollama server &
|
|
|
+```
|
|
|
+
|
|
|
+Finally, run a model!
|
|
|
+
|
|
|
+```
|
|
|
+./ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin
|
|
|
+```
|
|
|
+
|
|
|
+## API Reference
|
|
|
+
|
|
|
+### `POST /completion`
|
|
|
+
|
|
|
+Complete a prompt
|
|
|
+
|
|
|
+```
|
|
|
+curl -X POST http://localhost:8080/completion -H 'Content-Type: application/json' -d '{"model": "/path/to/model", "prompt": "Once upon a time", "stream": true}'
|
|
|
+```
|