From dde362bd4736956d719056fe369acd74f40a478d Mon Sep 17 00:00:00 2001
From: Slava Eliseev <vyacheslav.eliseev@isseurope.rs>
Date: Sat, 22 Mar 2025 00:48:25 +0300
Subject: [PATCH] doc: Add info about ollama context length

---
 docs/docs/usage-guide/changing_a_model.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/docs/usage-guide/changing_a_model.md b/docs/docs/usage-guide/changing_a_model.md
index 844cd6a5..cc4bd253 100644
--- a/docs/docs/usage-guide/changing_a_model.md
+++ b/docs/docs/usage-guide/changing_a_model.md
@@ -54,6 +54,10 @@ duplicate_examples=true # will duplicate the examples in the prompt, to help the
 api_base = "http://localhost:11434" # or whatever port you're running Ollama on
 ```
 
+By default, Ollama uses a context window size of 2048 tokens. In most cases this is not enough to cover pr-agent promt and pull-request diff. Context window size can be overridden with the `OLLAMA_CONTEXT_LENGTH` environment variable. For example, to set the default context length to 8K, use: `OLLAMA_CONTEXT_LENGTH=8192 ollama serve`. More information you can find on the [official ollama faq](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size).
+
+Please note that the `custom_model_max_tokens` setting should be configured in accordance with the `OLLAMA_CONTEXT_LENGTH`. Failure to do so may result in unexpected model output.
+
 !!! note "Local models vs commercial models"
     Qodo Merge is compatible with almost any AI model, but analyzing complex code repositories and pull requests requires a model specifically optimized for code analysis.