Merge branch 'main' into of/compliance-tool

2025-07-21 04:50:39 +08:00 · 2025-07-17 15:44:13 +03:00
parent e87fdd0ab5 c0d7fd8c36
commit 2a37225574
25 changed files with 357 additions and 71 deletions
--- a/docs/docs/ai_search/index.md
+++ b/docs/docs/ai_search/index.md
@ -19,7 +19,6 @@
 </div>

 <style>
-Untitled
 .search-section {
    max-width: 800px;
    margin: 0 auto;
@ -305,9 +304,8 @@ window.addEventListener('load', function() {
            spinner.style.display = 'none';
            const errorDiv = document.createElement('div');
            errorDiv.className = 'error-message';
-            errorDiv.textContent = `${error}`;
-            resultsContainer.value = "";
-            resultsContainer.appendChild(errorDiv);
+            errorDiv.textContent = error instanceof Error ? error.message : String(error);
+            resultsContainer.replaceChildren(errorDiv);
        }
    }

--- a/docs/docs/core-abilities/index.md
+++ b/docs/docs/core-abilities/index.md
@ -6,8 +6,7 @@ Qodo Merge utilizes a variety of core abilities to provide a comprehensive and e
 - [Auto best practices](https://qodo-merge-docs.qodo.ai/core-abilities/auto_best_practices/)
 - [Chat on code suggestions](https://qodo-merge-docs.qodo.ai/core-abilities/chat_on_code_suggestions/)
 - [Chrome extension](https://qodo-merge-docs.qodo.ai/chrome-extension/)
- [Code validation](https://qodo-merge-docs.qodo.ai/core-abilities/code_validation/)
- [Compression strategy](https://qodo-merge-docs.qodo.ai/core-abilities/compression_strategy/)
+- [Code validation](https://qodo-merge-docs.qodo.ai/core-abilities/code_validation/) <!-- - [Compression strategy](https://qodo-merge-docs.qodo.ai/core-abilities/compression_strategy/) -->
 - [Dynamic context](https://qodo-merge-docs.qodo.ai/core-abilities/dynamic_context/)
 - [Fetching ticket context](https://qodo-merge-docs.qodo.ai/core-abilities/fetching_ticket_context/)
 - [Impact evaluation](https://qodo-merge-docs.qodo.ai/core-abilities/impact_evaluation/)
--- a/docs/docs/faq/index.md
+++ b/docs/docs/faq/index.md
@ -66,7 +66,7 @@ ___
 ___

 ??? note "Q: Can Qodo Merge review draft/offline PRs?"
-    #### Answer:<span style="display:none;">5</span>
+    #### Answer:<span style="display:none;">6</span>

    Yes. While Qodo Merge won't automatically review draft PRs, you can still get feedback by manually requesting it through [online commenting](https://qodo-merge-docs.qodo.ai/usage-guide/automations_and_usage/#online-usage).

@ -74,7 +74,7 @@ ___
 ___

 ??? note "Q: Can the 'Review effort' feedback be calibrated or customized?"
-    #### Answer:<span style="display:none;">5</span>
+    #### Answer:<span style="display:none;">7</span>

    Yes, you can customize review effort estimates using the `extra_instructions` configuration option (see [documentation](https://qodo-merge-docs.qodo.ai/tools/review/#configuration-options)).
    
--- a/docs/docs/installation/azure.md
+++ b/docs/docs/installation/azure.md
@ -1,7 +1,7 @@
 ## Azure DevOps Pipeline

 You can use a pre-built Action Docker image to run PR-Agent as an Azure devops pipeline.
-add the following file to your repository under `azure-pipelines.yml`:
+Add the following file to your repository under `azure-pipelines.yml`:

 ```yaml
 # Opt out of CI triggers
@ -71,7 +71,7 @@ git_provider="azure"
 ```

 Azure DevOps provider supports [PAT token](https://learn.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=azure-devops&tabs=Windows) or [DefaultAzureCredential](https://learn.microsoft.com/en-us/azure/developer/python/sdk/authentication-overview#authentication-in-server-environments) authentication.
-PAT is faster to create, but has build in expiration date, and will use the user identity for API calls.
+PAT is faster to create, but has built-in expiration date, and will use the user identity for API calls.
 Using DefaultAzureCredential you can use managed identity or Service principle, which are more secure and will create separate ADO user identity (via AAD) to the agent.

 If PAT was chosen, you can assign the value in .secrets.toml.
--- a/docs/docs/installation/bitbucket.md
+++ b/docs/docs/installation/bitbucket.md
@ -50,7 +50,7 @@ git_provider="bitbucket_server"
 and pass the Pull request URL:

 ```shell
-python cli.py --pr_url https://git.onpreminstanceofbitbucket.com/projects/PROJECT/repos/REPO/pull-requests/1 review
+python cli.py --pr_url https://git.on-prem-instance-of-bitbucket.com/projects/PROJECT/repos/REPO/pull-requests/1 review
 ```

 ### Run it as service
@ -63,6 +63,6 @@ docker push codiumai/pr-agent:bitbucket_server_webhook  # Push to your Docker re
 ```

 Navigate to `Projects` or `Repositories`, `Settings`, `Webhooks`, `Create Webhook`.
-Fill the name and URL, Authentication None select the Pull Request Opened checkbox to receive that event as webhook.
+Fill in the name and URL. For Authentication, select 'None'. Select the 'Pull Request Opened' checkbox to receive that event as a webhook.

 The URL should end with `/webhook`, for example: https://domain.com/webhook
--- a/docs/docs/installation/gitea.md
+++ b/docs/docs/installation/gitea.md
@ -17,12 +17,11 @@ git clone https://github.com/qodo-ai/pr-agent.git
 ```

 5. Prepare variables and secrets. Skip this step if you plan on setting these as environment variables when running the agent:
-1. In the configuration file/variables:
-    - Set `config.git_provider` to "gitea"
-
-2. In the secrets file/variables:
-    - Set your AI model key in the respective section
-    - In the [Gitea] section, set `personal_access_token` (with token from step 2) and `webhook_secret` (with secret from step 3)
+    - In the configuration file/variables:
+        - Set `config.git_provider` to "gitea"
+    - In the secrets file/variables:
+        - Set your AI model key in the respective section
+        - In the [Gitea] section, set `personal_access_token` (with token from step 2) and `webhook_secret` (with secret from step 3)

 6. Build a Docker image for the app and optionally push it to a Docker repository. We'll use Dockerhub as an example:

--- a/docs/docs/installation/gitlab.md
+++ b/docs/docs/installation/gitlab.md
@ -46,7 +46,7 @@ Note that if your base branches are not protected, don't set the variables as `p

 1. In GitLab create a new user and give it "Reporter" role ("Developer" if using Pro version of the agent) for the intended group or project.

-2. For the user from step 1. generate a `personal_access_token` with `api` access.
+2. For the user from step 1, generate a `personal_access_token` with `api` access.

 3. Generate a random secret for your app, and save it for later (`shared_secret`). For example, you can use:

@ -111,7 +111,7 @@ For example: `GITLAB.PERSONAL_ACCESS_TOKEN` --> `GITLAB__PERSONAL_ACCESS_TOKEN`
 4. Create a lambda function that uses the uploaded image. Set the lambda timeout to be at least 3m.
 5. Configure the lambda function to have a Function URL.
 6. In the environment variables of the Lambda function, specify `AZURE_DEVOPS_CACHE_DIR` to a writable location such as /tmp. (see [link](https://github.com/Codium-ai/pr-agent/pull/450#issuecomment-1840242269))
-7. Go back to steps 8-9 of [Run a GitLab webhook server](#run-a-gitlab-webhook-server) with the function url as your Webhook URL.
+7. Go back to steps 8-9 of [Run a GitLab webhook server](#run-a-gitlab-webhook-server) with the function URL as your Webhook URL.
    The Webhook URL would look like `https://<LAMBDA_FUNCTION_URL>/webhook`

 ### Using AWS Secrets Manager
--- a/docs/docs/installation/locally.md
+++ b/docs/docs/installation/locally.md
@ -12,7 +12,7 @@ To invoke a tool (for example `review`), you can run PR-Agent directly from the
 - For GitHub:

    ```bash
-    docker run --rm -it -e OPENAI.KEY=<your key> -e GITHUB.USER_TOKEN=<your token> codiumai/pr-agent:latest --pr_url <pr_url> review
+    docker run --rm -it -e OPENAI.KEY=<your_openai_key> -e GITHUB.USER_TOKEN=<your_github_token> codiumai/pr-agent:latest --pr_url <pr_url> review
    ```

    If you are using GitHub enterprise server, you need to specify the custom url as variable.
--- a/docs/docs/pr_benchmark/index.md
+++ b/docs/docs/pr_benchmark/index.md
@ -58,6 +58,12 @@ A list of the models used for generating the baseline suggestions, and example r
      <td style="text-align:left;">1024</td>
      <td style="text-align:center;"><b>44.3</b></td>
    </tr>
+    <tr>
+      <td style="text-align:left;">Grok-4</td>
+      <td style="text-align:left;">2025-07-09</td>
+      <td style="text-align:left;">unknown</td>
+      <td style="text-align:center;"><b>41.7</b></td>
+    </tr>
    <tr>
      <td style="text-align:left;">Claude-4-sonnet</td>
      <td style="text-align:left;">2025-05-14</td>
@ -82,6 +88,12 @@ A list of the models used for generating the baseline suggestions, and example r
      <td style="text-align:left;"></td>
      <td style="text-align:center;"><b>33.5</b></td>
    </tr>
+    <tr>
+      <td style="text-align:left;">Claude-4-opus-20250514</td>
+      <td style="text-align:left;">2025-05-14</td>
+      <td style="text-align:left;"></td>
+      <td style="text-align:center;"><b>32.8</b></td>
+    </tr>
    <tr>
      <td style="text-align:left;">Claude-3.7-sonnet</td>
      <td style="text-align:left;">2025-02-19</td>
@ -240,6 +252,39 @@ weaknesses:
 - **Introduces new problems:** Several suggestions add unsupported APIs, undeclared variables, wrong types, or break compilation, hurting trust in the recommendations.
 - **Rule violations:** It often edits lines outside the diff, exceeds the 3-suggestion cap, or labels cosmetic tweaks as “critical”, showing inconsistent guideline compliance.

+### Claude-4 Opus
+
+final score: **32.8**
+
+strengths:
+
+- **Format & rule adherence:** Almost always returns valid YAML, stays within the ≤3-suggestion limit, and usually restricts edits to newly-added lines, so its output is easy to apply automatically.
+- **Concise, focused patches:** When it does find a real bug it gives short, well-scoped explanations plus minimal diff snippets, often outperforming verbose baselines in clarity.
+- **Able to catch subtle edge-cases:** In several examples it detected overflow, race-condition or enum-mismatch issues that many other models missed, showing solid code‐analysis capability.
+
+weaknesses:
+
+- **Low recall / narrow coverage:** In a large share of the 399 examples the model produced an empty list or only one minor tip while more serious defects were present, causing it to be rated inferior to most baselines.
+- **Frequent incorrect or no-op fixes:** It sometimes supplies identical “before/after” code, flags non-issues, or suggests changes that would break compilation or logic, reducing reviewer trust.
+- **Shaky guideline consistency:** Although generally compliant, it still occasionally violates rules (touches unchanged lines, offers stylistic advice, adds imports) and duplicates suggestions, indicating unstable internal checks.
+
+### Grok-4
+
+final score: **32.8**
+
+strengths:
+
+- **Focused and concise fixes:** When the model does detect a problem it usually proposes a minimal, well-scoped patch that compiles and directly addresses the defect without unnecessary noise.  
+- **Good critical-bug instinct:** It often prioritises show-stoppers (compile failures, crashes, security issues) over cosmetic matters and occasionally spots subtle issues that all other reviewers miss.  
+- **Clear explanations & snippets:** Explanations are short, readable and paired with ready-to-paste code, making the advice easy to apply.  
+
+weaknesses:
+
+- **High miss rate:** In a large fraction of examples the model returned an empty list or covered only one minor issue while overlooking more serious newly-introduced bugs.  
+- **Inconsistent accuracy:** A noticeable subset of answers contain wrong or even harmful fixes (e.g., removing valid flags, creating compile errors, re-introducing bugs).  
+- **Limited breadth:** Even when it finds a real defect it rarely reports additional related problems that peers catch, leading to partial reviews.  
+- **Occasional guideline slips:** A few replies modify unchanged lines, suggest new imports, or duplicate suggestions, showing imperfect compliance with instructions.
+
 ## Appendix - Example Results

 Some examples of benchmarked PRs and their results:
--- a/docs/docs/recent_updates/index.md
+++ b/docs/docs/recent_updates/index.md
@ -13,7 +13,7 @@ It also outlines our development roadmap for the upcoming three months. Please n
    - **Simplified Free Tier**: Qodo Merge now offers a simplified free tier with a monthly limit of 75 PR reviews per organization, replacing the previous two-week trial. ([Learn more](https://qodo-merge-docs.qodo.ai/installation/qodo_merge/#cloud-users))
    - **CLI Endpoint**: A new Qodo Merge endpoint that accepts a lists of before/after code changes, executes Qodo Merge commands, and return the results. Currently available for enterprise customers. Contact [Qodo](https://www.qodo.ai/contact/) for more information.
    - **Linear tickets support**: Qodo Merge now supports Linear tickets. ([Learn more](https://qodo-merge-docs.qodo.ai/core-abilities/fetching_ticket_context/#linear-integration))
-    - **Smart Update**: Upon PR updates, Qodo Merge will offer tailored code suggestions, addressing both the entire PR and the specific incremental changes since the last feedback  ([Learn more](https://qodo-merge-docs.qodo.ai/core-abilities/incremental_update//))
+    - **Smart Update**: Upon PR updates, Qodo Merge will offer tailored code suggestions, addressing both the entire PR and the specific incremental changes since the last feedback  ([Learn more](https://qodo-merge-docs.qodo.ai/core-abilities/incremental_update/))

 === "Future Roadmap"
    - **Enhanced `review` tool**: Enhancing the `review` tool validate compliance across multiple categories including security, tickets, and custom best practices.
--- a/docs/docs/tools/describe.md
+++ b/docs/docs/tools/describe.md
@ -47,11 +47,16 @@ publish_labels = true

 ## Preserving the original user description

-By default, Qodo Merge preserves your original PR description by placing it above the generated content.
+By default, Qodo Merge tries to preserve your original PR description by placing it above the generated content.
 This requires including your description during the initial PR creation.
-Be aware that if you edit the description while the automated tool is running, a race condition may occur, potentially causing your original description to be lost.

-When updating PR descriptions, the `/describe` tool considers everything above the "PR Type" field as user content and will preserve it.
+"Qodo removed the original description from the PR. Why"?
+
+From our experience, there are two possible reasons:
+
+- If you edit the description _while_ the automated tool is running, a race condition may occur, potentially causing your original description to be lost. Hence, create a description before launching the PR.
+
+- When _updating_ PR descriptions, the `/describe` tool considers everything above the "PR Type" field as user content and will preserve it.
 Everything below this marker is treated as previously auto-generated content and will be replaced.

 ![Describe comment](https://codium.ai/images/pr_agent/pr_description_user_description.png){width=512}
@ -177,9 +182,12 @@ pr_agent:summary

 ## PR Walkthrough:
 pr_agent:walkthrough
+
+## PR Diagram:
+pr_agent:diagram
 ```

-The marker `pr_agent:type` will be replaced with the PR type, `pr_agent:summary` will be replaced with the PR summary, and `pr_agent:walkthrough` will be replaced with the PR walkthrough.
+The marker `pr_agent:type` will be replaced with the PR type, `pr_agent:summary` will be replaced with the PR summary, `pr_agent:walkthrough` will be replaced with the PR walkthrough, and `pr_agent:diagram` will be replaced with the sequence diagram (if enabled).

 ![Describe markers before](https://codium.ai/images/pr_agent/describe_markers_before.png){width=512}

@ -191,6 +199,7 @@ becomes

 - `use_description_markers`: if set to true, the tool will use markers template. It replaces every marker of the form `pr_agent:marker_name` with the relevant content. Default is false.
 - `include_generated_by_header`: if set to true, the tool will add a dedicated header: 'Generated by PR Agent at ...' to any automatic content. Default is true.
+- `diagram`: if present as a marker, will be replaced by the PR sequence diagram (if enabled).

 ## Custom labels

--- a/docs/docs/usage-guide/automations_and_usage.md
+++ b/docs/docs/usage-guide/automations_and_usage.md
@ -30,7 +30,7 @@ verbosity_level=2
 This is useful for debugging or experimenting with different tools.

 3. **git provider**: The [git_provider](https://github.com/Codium-ai/pr-agent/blob/main/pr_agent/settings/configuration.toml#L5) field in a configuration file determines the GIT provider that will be used by Qodo Merge. Currently, the following providers are supported:
-`github` **(default)**, `gitlab`, `bitbucket`, `azure`, `codecommit`, `local`,`gitea`, and `gerrit`.
+`github` **(default)**, `gitlab`, `bitbucket`, `azure`, `codecommit`, `local`, and `gitea`.

 ### CLI Health Check

--- a/docs/docs/usage-guide/changing_a_model.md
+++ b/docs/docs/usage-guide/changing_a_model.md
@ -32,6 +32,16 @@ OPENAI__API_BASE=https://api.openai.com/v1
 OPENAI__KEY=sk-...
 ```

+### OpenAI Flex Processing
+
+To reduce costs for non-urgent/background tasks, enable Flex Processing:
+
+```toml
+[litellm]
+extra_body='{"processing_mode": "flex"}'
+```
+
+See [OpenAI Flex Processing docs](https://platform.openai.com/docs/guides/flex-processing) for details.

 ### Azure