diff --git a/Usage.md b/Usage.md
index e5e5d638..96ffc8c0 100644
--- a/Usage.md
+++ b/Usage.md
@@ -7,6 +7,7 @@
 - [Working with GitHub App](#working-with-github-app)
 - [Working with GitHub Action](#working-with-github-action)
 - [Changing a model](#changing-a-model)
+- [Working with large PRs](#working-with-large-prs)
 - [Appendix - additional configurations walkthrough](#appendix---additional-configurations-walkthrough)
 
 ### Introduction
@@ -240,6 +241,17 @@ key = ...
 
 Also review the [AiHandler](pr_agent/algo/ai_handler.py) file for instruction how to set keys for other models.
 
+### Working with large PRs
+
+The default mode of CodiumAI is to have a single call per tool, using GPT-4, which has a token limit of 8000 tokens.
+This mode provide a very good speed-quality-cost tradeoff, and can handle most PRs successfully. 
+When the PR is above the token limit, it employs a [PR Compression strategy](./PR_COMPRESSION.md).
+
+However, for very large PRs, or in case you want to emphasize quality over speed and cost, there are 2 possible solutions:
+1) [use a model](#changing-a-model) with larger context, like GPT-32K, or claude-100K. This solution will be applicable for all the tools
+2) For the `/improve` tool, there is an ['extended' mode](./docs/IMPROVE.md) (`/improve --extended`), 
+which divides the PR to chunks, and process each chunk separately, so regardless of the model, no compression will be done (but for large PRs, multiple calls may occur)
+
 ### Appendix - additional configurations walkthrough