2 hónapja · 0d15036d82
--- a/docs/implement.md
+++ b/docs/implement.md
@@ -0,0 +1,216 @@
 
				+# Guide: Implementing Models in Ollama's Go Inference Engine
			
 
				+
			
 
				+> **Note**: This guide and the Go inference engine are in early development and will be updated as implementation details evolve.
			
 
				+
			
 
				+This guide outlines the process of implementing a new model in Ollama's Go GGML inference engine. It covers everything from initial setup to deploying your model to ollama.com.
			
 
				+
			
 
				+## Architecture Overview
			
 
				+
			
 
				+Below is a diagram showing Ollama's inference engine architecture layers and how they interact:
			
 
				+
			
 
				+```mermaid
			
 
				+graph TB
			
 
				+    subgraph Models["Model Layer: LLM Implementations"]
			
 
				+        direction TB
			
 
				+        llama["model/models/llama/model.go"]
			
 
				+        mllama["model/models/mllama/model.go"]
			
 
				+        qwen["model/models/qwen2/model.go"]
			
 
				+        qwen_vl["model/models/qwen2vl/model.go"]
			
 
				+        
			
 
				+        note1["Each model implements a specific architecture
			
 
				+        - Defines model parameters
			
 
				+        - Implements forward pass"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph ML_Ops["Neural Network Operations"]
			
 
				+        direction TB
			
 
				+        nn_ops["nn/
			
 
				+            linear.go - Matrix operations
			
 
				+            embedding.go - Token embeddings
			
 
				+            normalization.go - Layer normalization
			
 
				+            convolution.go - Conv operations"]
			
 
				+        
			
 
				+        backend["ml/backend.go
			
 
				+        Hardware Abstraction Layer
			
 
				+        - Defines tensor operations
			
 
				+        - Manages computation graphs
			
 
				+        - Handles memory allocation"]
			
 
				+
			
 
				+        note2["Common neural net operations
			
 
				+        used across different models
			
 
				+        - Abstracts hardware details
			
 
				+        - Provides unified API
			
 
				+        - Manages computation flow"]
			
 
				+    end
			
 
				+
			
 
				+    subgraph GGML["Hardware Execution Layer"]
			
 
				+        direction TB
			
 
				+        ggml["ggml.go
			
 
				+        CGO Interface
			
 
				+        - Bridges Go and C++
			
 
				+        - Handles type conversion
			
 
				+        - Manages memory between languages"]
			
 
				+        
			
 
				+        subgraph Hardware_Specific["Hardware-Specific Implementations"]
			
 
				+            direction LR
			
 
				+            cpu["ggml-cpu.h
			
 
				+            CPU optimized ops"]
			
 
				+            cuda["ggml-cuda.h
			
 
				+            NVIDIA GPU ops"]
			
 
				+            metal["ggml-metal.h
			
 
				+            Apple GPU ops"]
			
 
				+            vulkan["ggml-vulkan.h
			
 
				+            Cross-platform GPU"]
			
 
				+            opencl["ggml-opencl.h
			
 
				+            OpenCL acceleration"]
			
 
				+        end
			
 
				+
			
 
				+        note3["GGML provides optimized 
			
 
				+        implementations for each hardware:
			
 
				+        - Automatic dispatch
			
 
				+        - Hardware-specific optimizations
			
 
				+        - Memory management
			
 
				+        - Parallel execution"]
			
 
				+    end
			
 
				+
			
 
				+    %% Connections with explanations
			
 
				+    Models --> |"Makes high-level calls
			
 
				+    (e.g., self-attention)"| ML_Ops
			
 
				+    ML_Ops --> |"Translates to tensor operations
			
 
				+    (e.g., matmul, softmax)"| GGML
			
 
				+    GGML --> |"Executes optimized code
			
 
				+    on target hardware"| Hardware_Specific
			
 
				+    
			
 
				+    %% Styling
			
 
				+    classDef model fill:#fff,stroke:#01579b,stroke-width:2px
			
 
				+    classDef ml fill:#fff,stroke:#e65100,stroke-width:2px
			
 
				+    classDef hw fill:#fff,stroke:#b71c1c,stroke-width:2px
			
 
				+    classDef note fill:#fff,stroke:#666,stroke-dasharray: 5 5
			
 
				+    
			
 
				+    class llama,mllama,qwen,qwen_vl,pixtral model
			
 
				+    class nn_ops,backend ml
			
 
				+    class ggml,cpu,cuda,metal,vulkan,opencl hw
			
 
				+    class note1,note2,note3 note
			
 
				+
			
 
				+    %% Style subgraphs
			
 
				+    style Models fill:#fff,stroke:#01579b,stroke-width:2px
			
 
				+    style ML_Ops fill:#fff,stroke:#e65100,stroke-width:2px
			
 
				+    style GGML fill:#fff,stroke:#b71c1c,stroke-width:2px
			
 
				+    style Hardware_Specific fill:#fff,stroke:#b71c1c,stroke-width:1px
			
 
				+```
			
 
				+
			
 
				+When implementing a new model, you'll primarily work in the model layer, interfacing with the neural network operations layer.
			
 
				+
			
 
				+## Implementation Steps
			
 
				+
			
 
				+### 1. Environment Setup
			
 
				+
			
 
				+First, clone the Ollama repository and get it running locally. Follow the development setup guide at:
			
 
				+https://github.com/ollama/ollama/blob/main/docs/development.md
			
 
				+
			
 
				+### 2. Research Implementation
			
 
				+
			
 
				+Get the original model implementation running. This typically involves:
			
 
				+- Cloning the research code repository (usually Python-based)
			
 
				+- Setting up the required environment
			
 
				+- Running inference with sample inputs
			
 
				+- Understanding the model architecture and forward pass
			
 
				+
			
 
				+### 3. Project Structure Setup
			
 
				+
			
 
				+Create the necessary file structure by referencing previous model implementations. You'll need:
			
 
				+
			
 
				+```
			
 
				+model/
			
 
				+└── your-model/
			
 
				+    ├── model.go         # Architecture and forward pass implementation
			
 
				+    ├── convert.go       # Weight conversion logic (PyTorch/SafeTensors to GGML)
			
 
				+    └── convert_test.go  # Conversion logic tests
			
 
				+```
			
 
				+
			
 
				+Add your model to the main paths in [model/models/models.go](https://github.com/ollama/ollama/blob/main/model/models/models.go):
			
 
				+
			
 
				+```
			
 
				+package models
			
 
				+
			
 
				+import (
			
 
				+    _ "github.com/ollama/ollama/model/models/llama"
			
 
				+    _ "github.com/ollama/ollama/model/models/mllama"
			
 
				+    _ "github.com/ollama/ollama/model/models/your-model"  // Add your model here
			
 
				+)
			
 
				+```
			
 
				+
			
 
				+### 4. Development Process
			
 
				+
			
 
				+1. **Open a Draft PR**
			
 
				+   - Create a draft pull request in the `ollama/ollama` repository
			
 
				+   - Use this as a communication channel with Ollama maintainers
			
 
				+
			
 
				+2. **Implement Weight Conversion**
			
 
				+   - Work on `convert.go`
			
 
				+   - Reference existing conversion implementations
			
 
				+   - Create a basic Modelfile:
			
 
				+     ```
			
 
				+     FROM /path/to/model
			
 
				+     ```
			
 
				+   - Test conversion:
			
 
				+     ```bash
			
 
				+     go run . create <my-model> -f /path/to/Modelfile
			
 
				+     ```
			
 
				+
			
 
				+3. **Implement Model Logic**
			
 
				+   - Implement `New()` and `Forward()` functions in `model.go`
			
 
				+   - Reference existing model implementations
			
 
				+   - Debug forward pass:
			
 
				+     ```bash
			
 
				+     OLLAMA_DEBUG=1 go run . run <my-model>
			
 
				+     ```
			
 
				+   - Compare output with research implementation
			
 
				+
			
 
				+4. **Tokenizer Implementation**
			
 
				+   - Implement a new tokenizer if required
			
 
				+   - Ensure compatibility with model architecture
			
 
				+
			
 
				+5. **Text Generation Testing**
			
 
				+   - Implement proper prompt formatting
			
 
				+   - Test basic generation:
			
 
				+     ```bash
			
 
				+     go run . run <my-model> "hello"
			
 
				+     ```
			
 
				+
			
 
				+### 5. Testing
			
 
				+
			
 
				+1. Add comprehensive tests to:
			
 
				+   - `model_test.go`
			
 
				+   - `convert_test.go`
			
 
				+
			
 
				+2. Ensure tests cover:
			
 
				+   - Weight conversion
			
 
				+   - Model initialization
			
 
				+   - Text generation
			
 
				+
			
 
				+### 6. Model Deployment
			
 
				+
			
 
				+1. **Finalize Pull Request**
			
 
				+   - Move PR out of draft state
			
 
				+   - Address reviewer feedback
			
 
				+
			
 
				+2. **Deploy to ollama.com**
			
 
				+   - Determine model prompt format
			
 
				+   - Convert prompt format to Go template
			
 
				+   - Create final Modelfile:
			
 
				+     ```
			
 
				+     FROM <converted-gguf>
			
 
				+     TEMPLATE <prompt-template>
			
 
				+     LICENSE <license-info>
			
 
				+     # Add additional parameters if needed
			
 
				+     ```
			
 
				+   - Push to ollama.com:
			
 
				+     ```bash
			
 
				+     ollama create <your-namespace>/<your-model> -f /path/to/Modelfile
			
 
				+     ollama push <your-namespace>/<your-model>
			
 
				+     ```
			
 
				+
			
 
				+3. **Integration Testing**
			
 
				+   - Run end-to-end tests
			
 
				+   - Verify model behavior in production environment