Compare commits

...

8 Commits

5 changed files with 66 additions and 38 deletions

View File

@ -31,12 +31,12 @@ PR-Agent aims to help efficiently review and handle pull requests, by providing
- [Getting Started](#getting-started)
- [News and Updates](#news-and-updates)
- [Overview](#overview)
- [Why Use PR-Agent?](#why-use-pr-agent)
- [Features](#features)
- [See It in Action](#see-it-in-action)
- [Try It Now](#try-it-now)
- [Qodo Merge 💎](#qodo-merge-)
- [How It Works](#how-it-works)
- [Why Use PR-Agent?](#why-use-pr-agent)
- [Data Privacy](#data-privacy)
- [Contributing](#contributing)
- [Links](#links)
@ -60,8 +60,9 @@ Run PR-Agent locally on your repository via command line: [Local CLI setup guide
### Qodo Merge as post-commit in your local IDE
See [here](https://github.com/qodo-ai/agents/tree/main/agents/qodo-merge-post-commit)
### Discover Qodo Merge 💎
### Discover Qodo Merge 💎
Zero-setup hosted solution with advanced features and priority support
- **[FREE for Open Source](https://github.com/marketplace/qodo-merge-pro-for-open-source)**: Full features, zero cost for public repos
- [Intro and Installation guide](https://qodo-merge-docs.qodo.ai/installation/qodo_merge/)
- [Plans & Pricing](https://www.qodo.ai/pricing/)
@ -101,11 +102,22 @@ New tool for Qodo Merge 💎 - `/scan_repo_discussions`.
Read more about it [here](https://qodo-merge-docs.qodo.ai/tools/scan_repo_discussions/).
## Overview
## Why Use PR-Agent?
A reasonable question that can be asked is: `"Why use PR-Agent? What makes it stand out from existing tools?"`
Here are some advantages of PR-Agent:
- We emphasize **real-life practical usage**. Each tool (review, improve, ask, ...) has a single LLM call, no more. We feel that this is critical for realistic team usage - obtaining an answer quickly (~30 seconds) and affordably.
- Our [PR Compression strategy](https://qodo-merge-docs.qodo.ai/core-abilities/#pr-compression-strategy) is a core ability that enables to effectively tackle both short and long PRs.
- Our JSON prompting strategy enables us to have **modular, customizable tools**. For example, the '/review' tool categories can be controlled via the [configuration](pr_agent/settings/configuration.toml) file. Adding additional categories is easy and accessible.
- We support **multiple git providers** (GitHub, GitLab, BitBucket), **multiple ways** to use the tool (CLI, GitHub Action, GitHub App, Docker, ...), and **multiple models** (GPT, Claude, Deepseek, ...)
## Features
<div style="text-align:left;">
Supported commands per platform:
PR-Agent and Qodo Merge offer comprehensive pull request functionalities integrated with various git providers:
| | | GitHub | GitLab | Bitbucket | Azure DevOps | Gitea |
|---------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|:------:|:------:|:---------:|:------------:|:-----:|
@ -227,17 +239,6 @@ The following diagram illustrates PR-Agent tools and their flow:
Check out the [PR Compression strategy](https://qodo-merge-docs.qodo.ai/core-abilities/#pr-compression-strategy) page for more details on how we convert a code diff to a manageable LLM prompt
## Why Use PR-Agent?
A reasonable question that can be asked is: `"Why use PR-Agent? What makes it stand out from existing tools?"`
Here are some advantages of PR-Agent:
- We emphasize **real-life practical usage**. Each tool (review, improve, ask, ...) has a single LLM call, no more. We feel that this is critical for realistic team usage - obtaining an answer quickly (~30 seconds) and affordably.
- Our [PR Compression strategy](https://qodo-merge-docs.qodo.ai/core-abilities/#pr-compression-strategy) is a core ability that enables to effectively tackle both short and long PRs.
- Our JSON prompting strategy enables us to have **modular, customizable tools**. For example, the '/review' tool categories can be controlled via the [configuration](pr_agent/settings/configuration.toml) file. Adding additional categories is easy and accessible.
- We support **multiple git providers** (GitHub, GitLab, BitBucket), **multiple ways** to use the tool (CLI, GitHub Action, GitHub App, Docker, ...), and **multiple models** (GPT, Claude, Deepseek, ...)
## Data Privacy
### Self-hosted PR-Agent

View File

@ -5,6 +5,7 @@ Qodo Merge utilizes a variety of core abilities to provide a comprehensive and e
- [Auto approval](https://qodo-merge-docs.qodo.ai/core-abilities/auto_approval/)
- [Auto best practices](https://qodo-merge-docs.qodo.ai/core-abilities/auto_best_practices/)
- [Chat on code suggestions](https://qodo-merge-docs.qodo.ai/core-abilities/chat_on_code_suggestions/)
- [Chrome extension](https://qodo-merge-docs.qodo.ai/chrome-extension/)
- [Code validation](https://qodo-merge-docs.qodo.ai/core-abilities/code_validation/)
- [Compression strategy](https://qodo-merge-docs.qodo.ai/core-abilities/compression_strategy/)
- [Dynamic context](https://qodo-merge-docs.qodo.ai/core-abilities/dynamic_context/)

View File

@ -24,7 +24,7 @@ To search the documentation site using natural language:
## Features
PR-Agent and Qodo Merge offers extensive pull request functionalities across various git providers:
PR-Agent and Qodo Merge offer comprehensive pull request functionalities integrated with various git providers:
| | | GitHub | GitLab | Bitbucket | Azure DevOps | Gitea |
| ----- |---------------------------------------------------------------------------------------------------------------------|:------:|:------:|:---------:|:------------:|:-----:|

View File

@ -3,15 +3,18 @@
## Methodology
Qodo Merge PR Benchmark evaluates and compares the performance of Large Language Models (LLMs) in analyzing pull request code and providing meaningful code suggestions.
Our diverse dataset comprises of 400 pull requests from over 100 repositories, spanning various programming languages and frameworks to reflect real-world scenarios.
Our diverse dataset contains 400 pull requests from over 100 repositories, spanning various programming languages and frameworks to reflect real-world scenarios.
- For each pull request, we have pre-generated suggestions from [11](https://qodo-merge-docs.qodo.ai/pr_benchmark/#models-used-for-generating-the-benchmark-baseline) different top-performing models using the Qodo Merge `improve` tool. The prompt for response generation can be found [here](https://github.com/qodo-ai/pr-agent/blob/main/pr_agent/settings/code_suggestions/pr_code_suggestions_prompts_not_decoupled.toml).
- For each pull request, we have pre-generated suggestions from eleven different top-performing models using the Qodo Merge `improve` tool. The prompt for response generation can be found [here](https://github.com/qodo-ai/pr-agent/blob/main/pr_agent/settings/code_suggestions/pr_code_suggestions_prompts_not_decoupled.toml).
- To benchmark a model, we generate its suggestions for the same pull requests and ask a high-performing judge model to **rank** the new model's output against the 11 pre-generated baseline suggestions. We utilize OpenAI's `o3` model as the judge, though other models have yielded consistent results. The prompt for this ranking judgment is available [here](https://github.com/Codium-ai/pr-agent-settings/tree/main/benchmark).
- To benchmark a model, we generate its suggestions for the same pull requests and ask a high-performing judge model to **rank** the new model's output against the pre-generated baseline suggestions. We utilize OpenAI's `o3` model as the judge, though other models have yielded consistent results. The prompt for this ranking judgment is available [here](https://github.com/Codium-ai/pr-agent-settings/tree/main/benchmark).
- We aggregate ranking outcomes across all pull requests, calculating performance metrics for the evaluated model. We also analyze the qualitative feedback from the judge to identify the model's comparative strengths and weaknesses against the established baselines.
- We aggregate ranking outcomes across all pull requests, calculating performance metrics for the evaluated model.
- We also analyze the qualitative feedback from the judge to identify the model's comparative strengths and weaknesses against the established baselines.
This approach provides not just a quantitative score but also a detailed analysis of each model's strengths and weaknesses.
A list of the models used for generating the baseline suggestions, and example results, can be found in the [Appendix](#appendix-example-results).
[//]: # (Note that this benchmark focuses on quality: the ability of an LLM to process complex pull request with multiple files and nuanced task to produce high-quality code suggestions.)
@ -237,18 +240,40 @@ weaknesses:
- **Introduces new problems:** Several suggestions add unsupported APIs, undeclared variables, wrong types, or break compilation, hurting trust in the recommendations.
- **Rule violations:** It often edits lines outside the diff, exceeds the 3-suggestion cap, or labels cosmetic tweaks as “critical”, showing inconsistent guideline compliance.
## Appendix - models used for generating the benchmark baseline
## Appendix - Example Results
- anthropic_sonnet_3.7_v1:0
- claude-4-opus-20250514
- claude-4-sonnet-20250514
- claude-4-sonnet-20250514_thinking_2048
- gemini-2.5-flash-preview-04-17
- gemini-2.5-pro-preview-05-06
- gemini-2.5-pro-preview-06-05_1024
- gemini-2.5-pro-preview-06-05_4096
- gpt-4.1
- o3
- o4-mini_medium
Some examples of benchmarked PRs and their results:
- [Example 1](https://www.qodo.ai/images/qodo_merge_benchmark/example_results1.html)
- [Example 2](https://www.qodo.ai/images/qodo_merge_benchmark/example_results2.html)
- [Example 3](https://www.qodo.ai/images/qodo_merge_benchmark/example_results3.html)
- [Example 4](https://www.qodo.ai/images/qodo_merge_benchmark/example_results4.html)
### Models Used for Benchmarking
The following models were used for generating the benchmark baseline:
```markdown
(1) anthropic_sonnet_3.7_v1:0
(2) claude-4-opus-20250514
(3) claude-4-sonnet-20250514
(4) claude-4-sonnet-20250514_thinking_2048
(5) gemini-2.5-flash-preview-04-17
(6) gemini-2.5-pro-preview-05-06
(7) gemini-2.5-pro-preview-06-05_1024
(8) gemini-2.5-pro-preview-06-05_4096
(9) gpt-4.1
(10) o3
(11) o4-mini_medium
```

View File

@ -46,6 +46,7 @@ nav:
- Auto approval: 'core-abilities/auto_approval.md'
- Auto best practices: 'core-abilities/auto_best_practices.md'
- Chat on code suggestions: 'core-abilities/chat_on_code_suggestions.md'
- Chrome extension: 'chrome-extension/index.md'
- Code validation: 'core-abilities/code_validation.md'
# - Compression strategy: 'core-abilities/compression_strategy.md'
- Dynamic context: 'core-abilities/dynamic_context.md'
@ -57,11 +58,11 @@ nav:
- RAG context enrichment: 'core-abilities/rag_context_enrichment.md'
- Self-reflection: 'core-abilities/self_reflection.md'
- Static code analysis: 'core-abilities/static_code_analysis.md'
- Chrome Extension:
- Qodo Merge Chrome Extension: 'chrome-extension/index.md'
- Features: 'chrome-extension/features.md'
- Data Privacy: 'chrome-extension/data_privacy.md'
- Options: 'chrome-extension/options.md'
# - Chrome Extension:
# - Qodo Merge Chrome Extension: 'chrome-extension/index.md'
# - Features: 'chrome-extension/features.md'
# - Data Privacy: 'chrome-extension/data_privacy.md'
# - Options: 'chrome-extension/options.md'
- PR Benchmark:
- PR Benchmark: 'pr_benchmark/index.md'
- Recent Updates: