initial pr compression documentation

2025-07-21 04:50:39 +08:00 · 2023-07-06 15:26:56 +03:00
parent 53e7ff62bf
commit 3e445c7e03
1 changed files with 19 additions and 0 deletions
--- a/PR_COMPRESSION.md
+++ b/PR_COMPRESSION.md
@ -0,0 +1,19 @@
+## PR Compression Strategy
+
+### Motivation
+Pull Requests can be very long and contain a lot of information with varying degree of relevance to the pr-agent.
+We want to be able to pack as much information as possible in a single LMM prompt, while keeping the information relevant to the pr-agent.
+
+### Our Strategy
+#### Repo language prioritization strategy
+We prioritize the languages of the repo based on the following criteria:
+1. Given the main languages used in the repo
+2. We sort the PR files by the most common languages in the repo (in descending order): 
+   * ```[[file.py, file2.py],[file3.js, file4.jsx],[readme.md]]```
+3. Withing each language we sort the files by the number of tokens in the file (in descending order):
+   * ```[[file2.py, file.py],[file4.jsx, file3.js],[readme.md]]```
+
+#### PR compression strategy
+
+####  Adaptive and token-aware file patch fitting:
+