Merge branch 'main' into case-update

This commit is contained in:
Ori Kotek
2023-07-16 16:49:47 +03:00
committed by GitHub
51 changed files with 1446 additions and 364 deletions

16
.github/workflows/review.yaml vendored Normal file
View File

@ -0,0 +1,16 @@
on:
pull_request:
issue_comment:
jobs:
pr_agent_job:
runs-on: ubuntu-latest
name: Run pr agent on every pull request
steps:
- name: PR Agent action step
id: pragent
uses: Codium-ai/pr-agent@main
env:
OPENAI_KEY: ${{ secrets.OPENAI_KEY }}
OPENAI_ORG: ${{ secrets.OPENAI_ORG }} # optional
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

18
CONFIGURATION.md Normal file
View File

@ -0,0 +1,18 @@
## Configuration
The different tools and sub-tools used by CodiumAI pr-agent are easily configurable via the configuration file: `/pr-agent/settings/configuration.toml`.
##### Git Provider:
You can select your git_provider with the flag `git_provider` in the `config` section
##### PR Reviewer:
You can enable/disable the different PR Reviewer abilities with the following flags (`pr_reviewer` section):
```
require_focused_review=true
require_tests_review=true
require_security_review=true
```
You can contol the number of suggestions returned by the PR Reviewer with the following flag:
```inline_code_comments=3```
And enable/disable the inline code suggestions with the following flag:
```inline_code_comments=true```

View File

10
Dockerfile.github_action Normal file
View File

@ -0,0 +1,10 @@
FROM python:3.10 as base
WORKDIR /app
ADD requirements.txt .
RUN pip install -r requirements.txt && rm requirements.txt
ENV PYTHONPATH=/app
ADD pr_agent pr_agent
ADD github_action/entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -0,0 +1 @@
FROM codiumai/pr-agent:github_action

206
README.md
View File

@ -9,18 +9,40 @@
[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Codium-ai/pr-agent/blob/main/LICENSE) [![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Codium-ai/pr-agent/blob/main/LICENSE)
[![Discord](https://badgen.net/badge/icon/discord?icon=discord&label&color=purple)](https://discord.com/channels/1057273017547378788/1126104260430528613) [![Discord](https://badgen.net/badge/icon/discord?icon=discord&label&color=purple)](https://discord.com/channels/1057273017547378788/1126104260430528613)
CodiumAI `PR-Agent` is an open-source tool aiming to help developers review PRs faster and more efficiently. It automatically analyzes the PR, provides feedback and suggestions, and can answer free-text questions.
</div> </div>
<div align="left">
CodiumAI `pr-agent` is an open-source tool aiming to help developers review PRs faster and more efficiently. It automatically analyzes the PR and can provide several types of feedback:
**Auto-Description**: Automatically generating PR description - name, type, summary, and code walkthrough.
\
**PR Review**: Feedback about the PR main theme, type, relevant tests, security issues, focused, and various suggestions for the PR content.
\
**Question Answering**: Answering free-text questions about the PR.
\
**Code Suggestion**: Committable code suggestions for improving the PR.
Example results:
</div>
<div align="center">
<p float="center">
<img src="./pics/pr_reviewer_1.png" width="800">
</p>
<p float="center">
<img src="./pics/pr_code_suggestions.png" width="800">
</p>
</div>
<div align="left">
- [Live demo](#live-demo) - [Live demo](#live-demo)
- [Quickstart](#Quickstart) - [Overview](#overview)
- [Quickstart](#quickstart)
- [Usage and tools](#usage-and-tools) - [Usage and tools](#usage-and-tools)
- [Configuration](#Configuration) - [Configuration](./CONFIGURATION.md)
- [How it works](#how-it-works) - [How it works](#how-it-works)
- [Roadmap](#roadmap) - [Roadmap](#roadmap)
- [Similar projects](#similar-projects) - [Similar projects](#similar-projects)
</div>
## Live demo ## Live demo
@ -31,6 +53,33 @@ Experience GPT-4 powered PR review on your public GitHub repository with our hos
To set up your own PR-Agent, see the [Quickstart](#Quickstart) section To set up your own PR-Agent, see the [Quickstart](#Quickstart) section
--- ---
## Overview
`pr-agent` offers extensive pull request functionalities across various git providers:
| | | Github | Gitlab | Bitbucket |
|-------|---------------------------------------------|--------|--------|-----------|
| TOOLS | Review | ✓ | ✓ | ✓ |
| | ⮑ Inline review | ✓ | ✓ | |
| | Ask | ✓ | ✓ | |
| | Auto-Description | ✓ | | |
| | Improve Code | ✓ | | |
| | | | | |
| USAGE | CLI | ✓ | ✓ | ✓ |
| | Tagging bot | ✓ | ✓ | |
| | Actions | ✓ | | |
| | | | | |
| CORE | PR compression | ✓ | ✓ | ✓ |
| | Repo language prioritization | ✓ | ✓ | ✓ |
| | Adaptive and token-aware<br />file patch fitting | ✓ | ✓ | ✓ |
Examples for invoking the different tools via the [CLI](#quickstart):
- **Review**: python cli.py --pr-url=<pr_url> review
- **Describe**: python cli.py --pr-url=<pr_url> describe
- **Improve**: python cli.py --pr-url=<pr_url> improve
- **Ask**: python cli.py --pr-url=<pr_url> ask "Write me a poem about this PR"
"<pr_url>" is the url of the relevant PR (for example: https://github.com/Codium-ai/pr-agent/pull/50).
In the [configuration](./CONFIGURATION.md) file you can select your git provider (Github, Gitlab, Bitbucket), and further configure the different tools.
## Quickstart ## Quickstart
@ -50,13 +99,13 @@ To request a review for a PR, or ask a question about a PR, you can run directly
1. To request a review for a PR, run the following command: 1. To request a review for a PR, run the following command:
``` ```
docker run --rm -it -e OPENAI.KEY=<your key> -e GITHUB.USER_TOKEN=<your token> codiumai/pr-agent --pr_url <pr url> docker run --rm -it -e OPENAI.KEY=<your key> -e GITHUB.USER_TOKEN=<your token> codiumai/pr-agent --pr_url <pr_url> review
``` ```
2. To ask a question about a PR, run the following command: 2. To ask a question about a PR, run the following command:
``` ```
docker run --rm -it -e OPENAI.KEY=<your key> -e GITHUB.USER_TOKEN=<your token> codiumai/pr-agent --pr_url <pr url> --question "<your question>" docker run --rm -it -e OPENAI.KEY=<your key> -e GITHUB.USER_TOKEN=<your token> codiumai/pr-agent --pr_url <pr_url> ask "<your question>"
``` ```
Possible questions you can ask include: Possible questions you can ask include:
@ -86,15 +135,17 @@ pip install -r requirements.txt
3. Copy the secrets template file and fill in your OpenAI key and your GitHub user token: 3. Copy the secrets template file and fill in your OpenAI key and your GitHub user token:
``` ```
cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets.toml
# Edit .secrets file # Edit .secrets.toml file
``` ```
4. Run the appropriate Python scripts from the scripts folder: 4. Run the appropriate Python scripts from the scripts folder:
``` ```
python pr_agent/cli.py --pr_url <pr url> python pr_agent/cli.py --pr_url <pr_url> review
python pr_agent/cli.py --pr_url <pr url> --question "<your question>" python pr_agent/cli.py --pr_url <pr_url> ask <your question>
python pr_agent/cli.py --pr_url <pr_url> describe
python pr_agent/cli.py --pr_url <pr_url> improve
``` ```
--- ---
@ -147,8 +198,8 @@ git clone https://github.com/Codium-ai/pr-agent.git
- Copy your app's webhook secret to the webhook_secret field. - Copy your app's webhook secret to the webhook_secret field.
``` ```
cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets.toml
# Edit .secrets file # Edit .secrets.toml file
``` ```
6. Build a Docker image for the app and optionally push it to a Docker repository. We'll use Dockerhub as an example: 6. Build a Docker image for the app and optionally push it to a Docker repository. We'll use Dockerhub as an example:
@ -172,122 +223,12 @@ docker push codiumai/pr-agent:github_app # Push to your Docker repository
## Usage and Tools ## Usage and Tools
CodiumAI PR-Agent provides two types of interactions ("tools"): `"PR Reviewer"` and `"PR Q&A"`. **PR-Agent** provides four types of interactions ("tools"): `"PR Reviewer"`, `"PR Q&A"`, `"PR Description"` and `"PR Code Sueggestions"`.
- The "PR Reviewer" tool automatically analyzes PRs, and provides different types of feedbacks. - The "PR Reviewer" tool automatically analyzes PRs, and provides different types of feedbacks.
- The "PR Q&A" tool answers free-text questions about the PR. - The "PR Ask" tool answers free-text questions about the PR.
- The "PR Description" tool automatically sets the PR Title and body.
### PR Reviewer - The "PR Code Suggestion" tool provide inline code suggestions for the PR, that can be applied and committed.
Here is a quick overview of the different sub-tools of PR Reviewer:
- PR Analysis
- Summarize main theme
- PR description and title
- PR type classification
- Is the PR covered by relevant tests
- Is the PR minimal and focused
- Are there security concerns
- PR Feedback
- General PR suggestions
- Code suggestions
This is how a typical output of the PR Reviewer looks like:
---
#### PR Analysis
- 🎯 **Main theme:** Adding language extension handler and token handler
- 🔍 **Description and title:** Yes
- 📌 **Type of PR:** Enhancement
- 🧪 **Relevant tests added:** No
-**Minimal and focused:** Yes, the PR is focused on adding two new handlers for language extension and token counting.
- 🔒 **Security concerns:** No, the PR does not introduce possible security concerns or issues.
#### PR Feedback
- 💡 **General PR suggestions:** The PR is generally well-structured and the code is clean. However, it would be beneficial to add some tests to ensure the new handlers work as expected. Also, consider adding docstrings to the new functions and classes to improve code readability and maintainability.
- 🤖 **Code suggestions:**
- **relevant file:** pr_agent/algo/language_handler.py
**suggestion content:** Consider using a set instead of a list for 'bad_extensions' as checking membership in a set is faster than in a list. [medium]
- **relevant file:** pr_agent/algo/language_handler.py
**suggestion content:** In the 'filter_bad_extensions' function, you are splitting the filename on '.' and taking the last element to get the extension. This might not work as expected if the filename contains multiple '.' characters. Consider using 'os.path.splitext' to get the file extension more reliably. [important]
---
### PR Q&A
This tool answers free-text questions about the PR. This is how a typical output of the PR Q&A looks like:
**Question**: summarize for me the PR in 4 bullet points
**Answer**:
- The PR introduces a new feature to sort files by their main languages. It uses a mapping of programming languages to their file extensions to achieve this.
- It also introduces a filter to exclude files with certain extensions, deemed as 'bad extensions', from the sorting process.
- The PR modifies the `get_pr_diff` function in `pr_processing.py` to use the new sorting function. It also refactors the code to move the PR pruning logic into a separate function.
- A new `TokenHandler` class is introduced in `token_handler.py` to handle token counting operations. This class is initialized with a PR, variables, system, and user, and provides methods to get system and user tokens and to count tokens in a patch.
---
## Configuration
The different tools and sub-tools used by CodiumAI PR-Agent are easily configurable via the configuration file: `/settings/configuration.toml`.
#### Enabling/disabling sub-tools:
You can enable/disable the different PR Reviewer sub-sections with the following flags:
```
require_minimal_and_focused_review=true
require_tests_review=true
require_security_review=true
```
#### Code Suggestions configuration:
There are also configuration options to control different aspects of the `code suggestions` feature.
The number of suggestions provided can be controlled by adjusting the following parameter:
```
num_code_suggestions=4
```
You can also enable more verbose and informative mode of code suggestions:
```
extended_code_suggestions=false
```
This is a comparison of the regular and extended code suggestions modes:
- **relevant file:** sql.py
- **suggestion content:** Remove hardcoded sensitive information like username and password. Use environment variables or a secure method to store these values. [important]
Example for extended suggestion:
- **relevant file:** sql.py
- **suggestion content:** Remove hardcoded sensitive information (username and password) [important]
- **why:** Hardcoding sensitive information is a security risk. It's better to use environment variables or a secure way to store these values.
- **code example:**
- **before code:**
```
user = "root",
password = "Mysql@123",
```
- **after code:**
```
user = os.getenv('DB_USER'),
password = os.getenv('DB_PASSWORD'),
```
---
## How it works ## How it works
@ -297,14 +238,15 @@ Check out the [PR Compression strategy](./PR_COMPRESSION.md) page for more detai
## Roadmap ## Roadmap
- [ ] Support open-source models, as a replacement for openai models. Note that a minimal requirement for each open-source model is to have 8k+ context, and good support for generating json as an output - [ ] Support open-source models, as a replacement for openai models. (Note - a minimal requirement for each open-source model is to have 8k+ context, and good support for generating json as an output)
- [ ] Support other Git providers, such as Gitlab and Bitbucket. - [x] Support other Git providers, such as Gitlab and Bitbucket.
- [ ] Develop additional logics for handling large PRs, and compressing git patches - [ ] Develop additional logics for handling large PRs, and compressing git patches
- [ ] Dedicated tools and sub-tools for specific programming languages (Python, Javascript, Java, C++, etc) - [ ] Dedicated tools and sub-tools for specific programming languages (Python, Javascript, Java, C++, etc)
- [ ] Add additional context to the prompt. For example, repo (or relevant files) summarization, with tools such a [ctags](https://github.com/universal-ctags/ctags) - [ ] Add additional context to the prompt. For example, repo (or relevant files) summarization, with tools such a [ctags](https://github.com/universal-ctags/ctags)
- [ ] Adding more tools. Possible directions: - [ ] Adding more tools. Possible directions:
- [ ] Code Quality - [x] PR description
- [ ] Coding Style - [x] Inline code suggestions
- [ ] Enforcing CONTRIBUTING.md guidelines
- [ ] Performance (are there any performance issues) - [ ] Performance (are there any performance issues)
- [ ] Documentation (is the PR properly documented) - [ ] Documentation (is the PR properly documented)
- [ ] Rank the PR importance - [ ] Rank the PR importance
@ -314,6 +256,6 @@ Check out the [PR Compression strategy](./PR_COMPRESSION.md) page for more detai
- [CodiumAI - Meaningful tests for busy devs](https://github.com/Codium-ai/codiumai-vscode-release) - [CodiumAI - Meaningful tests for busy devs](https://github.com/Codium-ai/codiumai-vscode-release)
- [Aider - GPT powered coding in your terminal](https://github.com/paul-gauthier/aider) - [Aider - GPT powered coding in your terminal](https://github.com/paul-gauthier/aider)
- [GPT-Engineer](https://github.com/AntonOsika/gpt-engineer) - [openai-pr-reviewer](https://github.com/coderabbitai/openai-pr-reviewer)
- [CodeReview BOT](https://github.com/anc95/ChatGPT-CodeReview) - [CodeReview BOT](https://github.com/anc95/ChatGPT-CodeReview)
- [AI-Maintainer](https://github.com/merwanehamadi/AI-Maintainer) - [AI-Maintainer](https://github.com/merwanehamadi/AI-Maintainer)

5
action.yaml Normal file
View File

@ -0,0 +1,5 @@
name: 'PR Agent'
description: 'Summarize, review and suggest improvements for pull requests'
runs:
using: 'docker'
image: 'Dockerfile.github_action_dockerhub'

View File

@ -0,0 +1,2 @@
#!/bin/bash
python /app/pr_agent/servers/github_action_runner.py

Binary file not shown.

Before

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

BIN
pics/main_pic_4_tools.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 260 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 413 KiB

After

Width:  |  Height:  |  Size: 316 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 335 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 193 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 137 KiB

After

Width:  |  Height:  |  Size: 161 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 267 KiB

BIN
pics/pr_reviewer_1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

BIN
pics/pr_reviewer_2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 42 KiB

View File

@ -1,25 +1,29 @@
import re import re
from typing import Optional
from pr_agent.tools.pr_code_suggestions import PRCodeSuggestions
from pr_agent.tools.pr_description import PRDescription
from pr_agent.tools.pr_questions import PRQuestions from pr_agent.tools.pr_questions import PRQuestions
from pr_agent.tools.pr_reviewer import PRReviewer from pr_agent.tools.pr_reviewer import PRReviewer
class PRAgent: class PRAgent:
def __init__(self, installation_id: Optional[int] = None): def __init__(self):
self.installation_id = installation_id pass
async def handle_request(self, pr_url, request): async def handle_request(self, pr_url, request) -> bool:
if 'please review' in request.lower() or 'review' == request.lower().strip() or len(request) == 0: if any(cmd in request for cmd in ["/review", "/review_pr"]):
reviewer = PRReviewer(pr_url, self.installation_id) await PRReviewer(pr_url).review()
await reviewer.review() elif any(cmd in request for cmd in ["/describe", "/describe_pr"]):
await PRDescription(pr_url).describe()
elif any(cmd in request for cmd in ["/improve", "/improve_code"]):
await PRCodeSuggestions(pr_url).suggest()
elif any(cmd in request for cmd in ["/ask", "/ask_question"]):
pattern = r'(/ask|/ask_question)\s*(.*)'
matches = re.findall(pattern, request, re.IGNORECASE)
if matches:
question = matches[0][1]
await PRQuestions(pr_url, question).answer()
else:
return False
else: return True
if "please answer" in request.lower():
question = re.split(r'(?i)please answer', request)[1].strip()
elif request.lower().strip().startswith("answer"):
question = re.split(r'(?i)answer', request)[1].strip()
else:
question = request
answerer = PRQuestions(pr_url, question, self.installation_id)
await answerer.answer()

View File

@ -14,6 +14,13 @@ class AiHandler:
openai.api_key = settings.openai.key openai.api_key = settings.openai.key
if settings.get("OPENAI.ORG", None): if settings.get("OPENAI.ORG", None):
openai.organization = settings.openai.org openai.organization = settings.openai.org
self.deployment_id = settings.get("OPENAI.DEPLOYMENT_ID", None)
if settings.get("OPENAI.API_TYPE", None):
openai.api_type = settings.openai.api_type
if settings.get("OPENAI.API_VERSION", None):
openai.api_version = settings.openai.api_version
if settings.get("OPENAI.API_BASE", None):
openai.api_base = settings.openai.api_base
except AttributeError as e: except AttributeError as e:
raise ValueError("OpenAI key is required") from e raise ValueError("OpenAI key is required") from e
@ -23,6 +30,7 @@ class AiHandler:
try: try:
response = await openai.ChatCompletion.acreate( response = await openai.ChatCompletion.acreate(
model=model, model=model,
deployment_id=self.deployment_id,
messages=[ messages=[
{"role": "system", "content": system}, {"role": "system", "content": system},
{"role": "user", "content": user} {"role": "user", "content": user}

View File

@ -13,6 +13,9 @@ def extend_patch(original_file_str, patch_str, num_lines) -> str:
if not patch_str or num_lines == 0: if not patch_str or num_lines == 0:
return patch_str return patch_str
if type(original_file_str) == bytes:
original_file_str = original_file_str.decode('utf-8')
original_lines = original_file_str.splitlines() original_lines = original_file_str.splitlines()
patch_lines = patch_str.splitlines() patch_lines = patch_str.splitlines()
extended_patch_lines = [] extended_patch_lines = []
@ -105,3 +108,78 @@ def handle_patch_deletions(patch: str, original_file_content_str: str,
logging.info(f"Processing file: {file_name}, hunks were deleted") logging.info(f"Processing file: {file_name}, hunks were deleted")
patch = patch_new patch = patch_new
return patch return patch
def convert_to_hunks_with_lines_numbers(patch: str, file) -> str:
# toDO: (maybe remove '-' and '+' from the beginning of the line)
"""
## src/file.ts
--new hunk--
881 line1
882 line2
883 line3
884 line4
885 line6
886 line7
887 + line8
888 + line9
889 line10
890 line11
...
--old hunk--
line1
line2
- line3
- line4
line5
line6
...
"""
patch_with_lines_str = f"## {file.filename}\n"
import re
patch_lines = patch.splitlines()
RE_HUNK_HEADER = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
new_content_lines = []
old_content_lines = []
match = None
start1, size1, start2, size2 = -1, -1, -1, -1
for line in patch_lines:
if 'no newline at end of file' in line.lower():
continue
if line.startswith('@@'):
match = RE_HUNK_HEADER.match(line)
if match and new_content_lines: # found a new hunk, split the previous lines
if new_content_lines:
patch_with_lines_str += '\n--new hunk--\n'
for i, line_new in enumerate(new_content_lines):
patch_with_lines_str += f"{start2 + i} {line_new}\n"
if old_content_lines:
patch_with_lines_str += '--old hunk--\n'
for i, line_old in enumerate(old_content_lines):
patch_with_lines_str += f"{line_old}\n"
new_content_lines = []
old_content_lines = []
start1, size1, start2, size2 = map(int, match.groups()[:4])
elif line.startswith('+'):
new_content_lines.append(line)
elif line.startswith('-'):
old_content_lines.append(line)
else:
new_content_lines.append(line)
old_content_lines.append(line)
# finishing last hunk
if match and new_content_lines:
if new_content_lines:
patch_with_lines_str += '\n--new hunk--\n'
for i, line_new in enumerate(new_content_lines):
patch_with_lines_str += f"{start2 + i} {line_new}\n"
if old_content_lines:
patch_with_lines_str += '\n--old hunk--\n'
for i, line_old in enumerate(old_content_lines):
patch_with_lines_str += f"{line_old}\n"
return patch_with_lines_str.strip()

View File

@ -93,7 +93,7 @@ def sort_files_by_main_languages(languages: Dict, files: list):
for ext in main_extensions: for ext in main_extensions:
main_extensions_flat.extend(ext) main_extensions_flat.extend(ext)
for extensions, lang in zip(main_extensions, languages_sorted_list): for extensions, lang in zip(main_extensions, languages_sorted_list): # noqa: B905
tmp = [] tmp = []
for file in files_filtered: for file in files_filtered:
extension_str = f".{file.filename.split('.')[-1]}" extension_str = f".{file.filename.split('.')[-1]}"

View File

@ -2,9 +2,10 @@ from __future__ import annotations
import difflib import difflib
import logging import logging
from typing import Any, Dict, Tuple, Union from typing import Any, Tuple, Union
from pr_agent.algo.git_patch_processing import extend_patch, handle_patch_deletions from pr_agent.algo.git_patch_processing import extend_patch, handle_patch_deletions, \
convert_to_hunks_with_lines_numbers
from pr_agent.algo.language_handler import sort_files_by_main_languages from pr_agent.algo.language_handler import sort_files_by_main_languages
from pr_agent.algo.token_handler import TokenHandler from pr_agent.algo.token_handler import TokenHandler
from pr_agent.config_loader import settings from pr_agent.config_loader import settings
@ -14,29 +15,38 @@ DELETED_FILES_ = "Deleted files:\n"
MORE_MODIFIED_FILES_ = "More modified files:\n" MORE_MODIFIED_FILES_ = "More modified files:\n"
OUTPUT_BUFFER_TOKENS = 800 OUTPUT_BUFFER_TOKENS_SOFT_THRESHOLD = 1000
OUTPUT_BUFFER_TOKENS_HARD_THRESHOLD = 600
PATCH_EXTRA_LINES = 3 PATCH_EXTRA_LINES = 3
def get_pr_diff(git_provider: Union[GithubProvider, Any], token_handler: TokenHandler) -> str: def get_pr_diff(git_provider: Union[GithubProvider, Any], token_handler: TokenHandler,
add_line_numbers_to_hunks: bool = False, disable_extra_lines: bool =False) -> str:
""" """
Returns a string with the diff of the PR. Returns a string with the diff of the PR.
If needed, apply diff minimization techniques to reduce the number of tokens If needed, apply diff minimization techniques to reduce the number of tokens
""" """
files = list(git_provider.get_diff_files()) if disable_extra_lines:
global PATCH_EXTRA_LINES
PATCH_EXTRA_LINES = 0
git_provider.pr.diff_files = list(git_provider.get_diff_files())
# get pr languages # get pr languages
pr_languages = sort_files_by_main_languages(git_provider.get_languages(), files) pr_languages = sort_files_by_main_languages(git_provider.get_languages(), git_provider.pr.diff_files)
# generate a standard diff string, with patch extension # generate a standard diff string, with patch extension
patches_extended, total_tokens = pr_generate_extended_diff(pr_languages, token_handler) patches_extended, total_tokens = pr_generate_extended_diff(pr_languages, token_handler,
add_line_numbers_to_hunks)
# if we are under the limit, return the full diff # if we are under the limit, return the full diff
if total_tokens + OUTPUT_BUFFER_TOKENS < token_handler.limit: if total_tokens + OUTPUT_BUFFER_TOKENS_SOFT_THRESHOLD < token_handler.limit:
return "\n".join(patches_extended) return "\n".join(patches_extended)
# if we are over the limit, start pruning # if we are over the limit, start pruning
patches_compressed, modified_file_names, deleted_file_names = pr_generate_compressed_diff(pr_languages, token_handler) patches_compressed, modified_file_names, deleted_file_names = \
pr_generate_compressed_diff(pr_languages, token_handler, add_line_numbers_to_hunks)
final_diff = "\n".join(patches_compressed) final_diff = "\n".join(patches_compressed)
if modified_file_names: if modified_file_names:
modified_list_str = MORE_MODIFIED_FILES_ + "\n".join(modified_file_names) modified_list_str = MORE_MODIFIED_FILES_ + "\n".join(modified_file_names)
@ -47,7 +57,8 @@ def get_pr_diff(git_provider: Union[GithubProvider, Any], token_handler: TokenHa
return final_diff return final_diff
def pr_generate_extended_diff(pr_languages: list, token_handler: TokenHandler) -> \ def pr_generate_extended_diff(pr_languages: list, token_handler: TokenHandler,
add_line_numbers_to_hunks: bool) -> \
Tuple[list, int]: Tuple[list, int]:
""" """
Generate a standard diff string, with patch extension Generate a standard diff string, with patch extension
@ -70,6 +81,9 @@ def pr_generate_extended_diff(pr_languages: list, token_handler: TokenHandler) -
extended_patch = extend_patch(original_file_content_str, patch, num_lines=PATCH_EXTRA_LINES) extended_patch = extend_patch(original_file_content_str, patch, num_lines=PATCH_EXTRA_LINES)
full_extended_patch = f"## {file.filename}\n\n{extended_patch}\n" full_extended_patch = f"## {file.filename}\n\n{extended_patch}\n"
if add_line_numbers_to_hunks:
full_extended_patch = convert_to_hunks_with_lines_numbers(extended_patch, file)
patch_tokens = token_handler.count_tokens(full_extended_patch) patch_tokens = token_handler.count_tokens(full_extended_patch)
file.tokens = patch_tokens file.tokens = patch_tokens
total_tokens += patch_tokens total_tokens += patch_tokens
@ -78,7 +92,8 @@ def pr_generate_extended_diff(pr_languages: list, token_handler: TokenHandler) -
return patches_extended, total_tokens return patches_extended, total_tokens
def pr_generate_compressed_diff(top_langs: list, token_handler: TokenHandler) -> Tuple[list, list, list]: def pr_generate_compressed_diff(top_langs: list, token_handler: TokenHandler,
convert_hunks_to_line_numbers: bool) -> Tuple[list, list, list]:
# Apply Diff Minimization techniques to reduce the number of tokens: # Apply Diff Minimization techniques to reduce the number of tokens:
# 0. Start from the largest diff patch to smaller ones # 0. Start from the largest diff patch to smaller ones
# 1. Don't use extend context lines around diff # 1. Don't use extend context lines around diff
@ -112,15 +127,19 @@ def pr_generate_compressed_diff(top_langs: list, token_handler: TokenHandler) ->
deleted_files_list.append(file.filename) deleted_files_list.append(file.filename)
total_tokens += token_handler.count_tokens(file.filename) + 1 total_tokens += token_handler.count_tokens(file.filename) + 1
continue continue
if convert_hunks_to_line_numbers:
patch = convert_to_hunks_with_lines_numbers(patch, file)
new_patch_tokens = token_handler.count_tokens(patch) new_patch_tokens = token_handler.count_tokens(patch)
# Hard Stop, no more tokens # Hard Stop, no more tokens
if total_tokens > token_handler.limit - OUTPUT_BUFFER_TOKENS // 2: if total_tokens > token_handler.limit - OUTPUT_BUFFER_TOKENS_HARD_THRESHOLD:
logging.warning(f"File was fully skipped, no more tokens: {file.filename}.") logging.warning(f"File was fully skipped, no more tokens: {file.filename}.")
continue continue
# If the patch is too large, just show the file name # If the patch is too large, just show the file name
if total_tokens + new_patch_tokens > token_handler.limit - OUTPUT_BUFFER_TOKENS: if total_tokens + new_patch_tokens > token_handler.limit - OUTPUT_BUFFER_TOKENS_SOFT_THRESHOLD:
# Current logic is to skip the patch if it's too large # Current logic is to skip the patch if it's too large
# TODO: Option for alternative logic to remove hunks from the patch to reduce the number of tokens # TODO: Option for alternative logic to remove hunks from the patch to reduce the number of tokens
# until we meet the requirements # until we meet the requirements
@ -133,7 +152,10 @@ def pr_generate_compressed_diff(top_langs: list, token_handler: TokenHandler) ->
continue continue
if patch: if patch:
if not convert_hunks_to_line_numbers:
patch_final = f"## {file.filename}\n\n{patch}\n" patch_final = f"## {file.filename}\n\n{patch}\n"
else:
patch_final = patch
patches.append(patch_final) patches.append(patch_final)
total_tokens += token_handler.count_tokens(patch_final) total_tokens += token_handler.count_tokens(patch_final)
if settings.config.verbosity_level >= 2: if settings.config.verbosity_level >= 2:

View File

@ -1,5 +1,8 @@
from __future__ import annotations from __future__ import annotations
import json
import logging
import re
import textwrap import textwrap
@ -8,11 +11,10 @@ def convert_to_markdown(output_data: dict) -> str:
emojis = { emojis = {
"Main theme": "🎯", "Main theme": "🎯",
"Description and title": "🔍",
"Type of PR": "📌", "Type of PR": "📌",
"Relevant tests added": "🧪", "Relevant tests added": "🧪",
"Unrelated changes": "⚠️", "Unrelated changes": "⚠️",
"Minimal and focused": "", "Focused PR": "",
"Security concerns": "🔒", "Security concerns": "🔒",
"General PR suggestions": "💡", "General PR suggestions": "💡",
"Code suggestions": "🤖" "Code suggestions": "🤖"
@ -50,10 +52,7 @@ def parse_code_suggestion(code_suggestions: dict) -> str:
code_str_indented = textwrap.indent(code_str, ' ') code_str_indented = textwrap.indent(code_str, ' ')
markdown_text += f" - **{code_key}:**\n{code_str_indented}\n" markdown_text += f" - **{code_key}:**\n{code_str_indented}\n"
else: else:
if "suggestion number" in sub_key.lower(): if "relevant file" in sub_key.lower():
# markdown_text += f"- **suggestion {sub_value}:**\n" # prettier formatting
pass
elif "relevant file" in sub_key.lower():
markdown_text += f"\n - **{sub_key}:** {sub_value}\n" markdown_text += f"\n - **{sub_key}:** {sub_value}\n"
else: else:
markdown_text += f" **{sub_key}:** {sub_value}\n" markdown_text += f" **{sub_key}:** {sub_value}\n"
@ -61,3 +60,25 @@ def parse_code_suggestion(code_suggestions: dict) -> str:
markdown_text += "\n" markdown_text += "\n"
return markdown_text return markdown_text
def try_fix_json(review, max_iter=10):
# Try to fix JSON if it is broken/incomplete: parse until the last valid code suggestion
data = {}
if review.rfind("'Code suggestions': [") > 0 or review.rfind('"Code suggestions": [') > 0:
last_code_suggestion_ind = [m.end() for m in re.finditer(r"\}\s*,", review)][-1] - 1
valid_json = False
iter_count = 0
while last_code_suggestion_ind > 0 and not valid_json and iter_count < max_iter:
try:
data = json.loads(review[:last_code_suggestion_ind] + "]}}")
valid_json = True
review = review[:last_code_suggestion_ind].strip() + "]}}"
except json.decoder.JSONDecodeError:
review = review[:last_code_suggestion_ind]
# Use regular expression to find the last occurrence of "}," with any number of whitespaces or newlines
last_code_suggestion_ind = [m.end() for m in re.finditer(r"\}\s*,", review)][-1] - 1
iter_count += 1
if not valid_json:
logging.error("Unable to decode JSON response from AI")
data = {}
return data

View File

@ -3,24 +3,60 @@ import asyncio
import logging import logging
import os import os
from pr_agent.tools.pr_code_suggestions import PRCodeSuggestions
from pr_agent.tools.pr_description import PRDescription
from pr_agent.tools.pr_questions import PRQuestions from pr_agent.tools.pr_questions import PRQuestions
from pr_agent.tools.pr_reviewer import PRReviewer from pr_agent.tools.pr_reviewer import PRReviewer
def run(): def run():
parser = argparse.ArgumentParser(description='AI based pull request analyzer') parser = argparse.ArgumentParser(description='AI based pull request analyzer', usage="""\
Usage: cli.py --pr-url <URL on supported git hosting service> <command> [<args>].
For example:
- cli.py --pr-url xxx review
- cli.py --pr-url xxx describe
- cli.py --pr-url xxx improve
- cli.py --pr-url xxx ask "write me a poem about this PR"
Supported commands:
review / review_pr - Add a review that includes a summary of the PR and specific suggestions for improvement.
ask / ask_question [question] - Ask a question about the PR.
describe / describe_pr - Modify the PR title and description based on the PR's contents.
improve / improve_code - Suggest improvements to the code in the PR as pull request comments ready to commit.
""")
parser.add_argument('--pr_url', type=str, help='The URL of the PR to review', required=True) parser.add_argument('--pr_url', type=str, help='The URL of the PR to review', required=True)
parser.add_argument('--question', type=str, help='Optional question to ask', required=False) parser.add_argument('command', type=str, help='The', choices=['review', 'review_pr',
'ask', 'ask_question',
'describe', 'describe_pr',
'improve', 'improve_code'], default='review')
parser.add_argument('rest', nargs=argparse.REMAINDER, default=[])
args = parser.parse_args() args = parser.parse_args()
logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO")) logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))
if args.question: command = args.command.lower()
print(f"Question: {args.question} about PR {args.pr_url}") if command in ['ask', 'ask_question']:
reviewer = PRQuestions(args.pr_url, args.question, installation_id=None) question = ' '.join(args.rest).strip()
if len(question) == 0:
print("Please specify a question")
parser.print_help()
return
print(f"Question: {question} about PR {args.pr_url}")
reviewer = PRQuestions(args.pr_url, question)
asyncio.run(reviewer.answer()) asyncio.run(reviewer.answer())
else: elif command in ['describe', 'describe_pr']:
print(f"PR description: {args.pr_url}")
reviewer = PRDescription(args.pr_url)
asyncio.run(reviewer.describe())
elif command in ['improve', 'improve_code']:
print(f"PR code suggestions: {args.pr_url}")
reviewer = PRCodeSuggestions(args.pr_url)
asyncio.run(reviewer.suggest())
elif command in ['review', 'review_pr']:
print(f"Reviewing PR: {args.pr_url}") print(f"Reviewing PR: {args.pr_url}")
reviewer = PRReviewer(args.pr_url, installation_id=None, cli_mode=True) reviewer = PRReviewer(args.pr_url, cli_mode=True)
asyncio.run(reviewer.review()) asyncio.run(reviewer.review())
else:
print(f"Unknown command: {command}")
parser.print_help()
if __name__ == '__main__': if __name__ == '__main__':

View File

@ -5,11 +5,14 @@ from dynaconf import Dynaconf
current_dir = dirname(abspath(__file__)) current_dir = dirname(abspath(__file__))
settings = Dynaconf( settings = Dynaconf(
envvar_prefix=False, envvar_prefix=False,
merge_enabled=True,
settings_files=[join(current_dir, f) for f in [ settings_files=[join(current_dir, f) for f in [
"settings/.secrets.toml", "settings/.secrets.toml",
"settings/configuration.toml", "settings/configuration.toml",
"settings/pr_reviewer_prompts.toml", "settings/pr_reviewer_prompts.toml",
"settings/pr_questions_prompts.toml", "settings/pr_questions_prompts.toml",
"settings/pr_description_prompts.toml",
"settings/pr_code_suggestions_prompts.toml",
"settings_prod/.secrets.toml" "settings_prod/.secrets.toml"
]] ]]
) )

View File

@ -1,15 +1,17 @@
from pr_agent.config_loader import settings from pr_agent.config_loader import settings
from pr_agent.git_providers.github_provider import GithubProvider from pr_agent.git_providers.github_provider import GithubProvider
from pr_agent.git_providers.gitlab_provider import GitLabProvider
_GIT_PROVIDERS = { _GIT_PROVIDERS = {
'github': GithubProvider 'github': GithubProvider,
'gitlab': GitLabProvider,
} }
def get_git_provider(): def get_git_provider():
try: try:
provider_id = settings.config.git_provider provider_id = settings.config.git_provider
except AttributeError as e: except AttributeError as e:
raise ValueError("github_provider is a required attribute in the configuration file") from e raise ValueError("git_provider is a required attribute in the configuration file") from e
if provider_id not in _GIT_PROVIDERS: if provider_id not in _GIT_PROVIDERS:
raise ValueError(f"Unknown git provider: {provider_id}") raise ValueError(f"Unknown git provider: {provider_id}")
return _GIT_PROVIDERS[provider_id] return _GIT_PROVIDERS[provider_id]

View File

@ -0,0 +1,104 @@
from abc import ABC, abstractmethod
from dataclasses import dataclass
# enum EDIT_TYPE (ADDED, DELETED, MODIFIED, RENAMED)
from enum import Enum
class EDIT_TYPE(Enum):
ADDED = 1
DELETED = 2
MODIFIED = 3
RENAMED = 4
@dataclass
class FilePatchInfo:
base_file: str
head_file: str
patch: str
filename: str
tokens: int = -1
edit_type: EDIT_TYPE = EDIT_TYPE.MODIFIED
old_filename: str = None
class GitProvider(ABC):
@abstractmethod
def get_diff_files(self) -> list[FilePatchInfo]:
pass
@abstractmethod
def publish_description(self, pr_title: str, pr_body: str):
pass
@abstractmethod
def publish_comment(self, pr_comment: str, is_temporary: bool = False):
pass
@abstractmethod
def publish_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str):
pass
@abstractmethod
def publish_code_suggestion(self, body: str, relevant_file: str,
relevant_lines_start: int, relevant_lines_end: int):
pass
@abstractmethod
def remove_initial_comment(self):
pass
@abstractmethod
def get_languages(self):
pass
@abstractmethod
def get_pr_branch(self):
pass
@abstractmethod
def get_user_id(self):
pass
@abstractmethod
def get_pr_description(self):
pass
def get_main_pr_language(languages, files) -> str:
"""
Get the main language of the commit. Return an empty string if cannot determine.
"""
main_language_str = ""
try:
top_language = max(languages, key=languages.get).lower()
# validate that the specific commit uses the main language
extension_list = []
for file in files:
extension_list.append(file.filename.rsplit('.')[-1])
# get the most common extension
most_common_extension = max(set(extension_list), key=extension_list.count)
# look for a match. TBD: add more languages, do this systematically
if most_common_extension == 'py' and top_language == 'python' or \
most_common_extension == 'js' and top_language == 'javascript' or \
most_common_extension == 'ts' and top_language == 'typescript' or \
most_common_extension == 'go' and top_language == 'go' or \
most_common_extension == 'java' and top_language == 'java' or \
most_common_extension == 'c' and top_language == 'c' or \
most_common_extension == 'cpp' and top_language == 'c++' or \
most_common_extension == 'cs' and top_language == 'c#' or \
most_common_extension == 'swift' and top_language == 'swift' or \
most_common_extension == 'php' and top_language == 'php' or \
most_common_extension == 'rb' and top_language == 'ruby' or \
most_common_extension == 'rs' and top_language == 'rust' or \
most_common_extension == 'scala' and top_language == 'scala' or \
most_common_extension == 'kt' and top_language == 'kotlin' or \
most_common_extension == 'pl' and top_language == 'perl' or \
most_common_extension == 'swift' and top_language == 'swift':
main_language_str = top_language
except Exception:
pass
return main_language_str

View File

@ -1,37 +1,35 @@
import logging import logging
from collections import namedtuple
from dataclasses import dataclass
from datetime import datetime from datetime import datetime
from typing import Optional, Tuple from typing import Optional, Tuple
from urllib.parse import urlparse from urllib.parse import urlparse
from github import AppAuthentication, File, Github from github import AppAuthentication, Github
from pr_agent.config_loader import settings from pr_agent.config_loader import settings
@dataclass from .git_provider import FilePatchInfo, GitProvider
class FilePatchInfo:
base_file: str
head_file: str
patch: str
filename: str
tokens: int = -1
class GithubProvider:
def __init__(self, pr_url: Optional[str] = None, installation_id: Optional[int] = None): class GithubProvider(GitProvider):
self.installation_id = installation_id def __init__(self, pr_url: Optional[str] = None):
self.installation_id = settings.get("GITHUB.INSTALLATION_ID")
self.github_client = self._get_github_client() self.github_client = self._get_github_client()
self.repo = None self.repo = None
self.pr_num = None self.pr_num = None
self.pr = None self.pr = None
self.github_user_id = None self.github_user_id = None
self.diff_files = None
if pr_url: if pr_url:
self.set_pr(pr_url) self.set_pr(pr_url)
self.last_commit_id = list(self.pr.get_commits())[-1]
def set_pr(self, pr_url: str): def set_pr(self, pr_url: str):
self.repo, self.pr_num = self._parse_pr_url(pr_url) self.repo, self.pr_num = self._parse_pr_url(pr_url)
self.pr = self._get_pr() self.pr = self._get_pr()
def get_files(self):
return self.pr.get_files()
def get_diff_files(self) -> list[FilePatchInfo]: def get_diff_files(self) -> list[FilePatchInfo]:
files = self.pr.get_files() files = self.pr.get_files()
diff_files = [] diff_files = []
@ -39,8 +37,13 @@ class GithubProvider:
original_file_content_str = self._get_pr_file_content(file, self.pr.base.sha) original_file_content_str = self._get_pr_file_content(file, self.pr.base.sha)
new_file_content_str = self._get_pr_file_content(file, self.pr.head.sha) new_file_content_str = self._get_pr_file_content(file, self.pr.head.sha)
diff_files.append(FilePatchInfo(original_file_content_str, new_file_content_str, file.patch, file.filename)) diff_files.append(FilePatchInfo(original_file_content_str, new_file_content_str, file.patch, file.filename))
self.diff_files = diff_files
return diff_files return diff_files
def publish_description(self, pr_title: str, pr_body: str):
self.pr.edit(title=pr_title, body=pr_body)
# self.pr.create_issue_comment(pr_comment)
def publish_comment(self, pr_comment: str, is_temporary: bool = False): def publish_comment(self, pr_comment: str, is_temporary: bool = False):
response = self.pr.create_issue_comment(pr_comment) response = self.pr.create_issue_comment(pr_comment)
if hasattr(response, "user") and hasattr(response.user, "login"): if hasattr(response, "user") and hasattr(response.user, "login"):
@ -50,6 +53,76 @@ class GithubProvider:
self.pr.comments_list = [] self.pr.comments_list = []
self.pr.comments_list.append(response) self.pr.comments_list.append(response)
def publish_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str):
self.diff_files = self.diff_files if self.diff_files else self.get_diff_files()
position = -1
for file in self.diff_files:
if file.filename.strip() == relevant_file:
patch = file.patch
patch_lines = patch.splitlines()
for i, line in enumerate(patch_lines):
if relevant_line_in_file in line:
position = i
break
elif relevant_line_in_file[0] == '+' and relevant_line_in_file[1:] in line:
# The model often adds a '+' to the beginning of the relevant_line_in_file even if originally
# it's a context line
position = i
break
if position == -1:
if settings.config.verbosity_level >= 2:
logging.info(f"Could not find position for {relevant_file} {relevant_line_in_file}")
else:
path = relevant_file.strip()
self.pr.create_review_comment(body=body, commit_id=self.last_commit_id, path=path, position=position)
def publish_code_suggestion(self, body: str,
relevant_file: str,
relevant_lines_start: int,
relevant_lines_end: int):
if not relevant_lines_start or relevant_lines_start == -1:
if settings.config.verbosity_level >= 2:
logging.exception(f"Failed to publish code suggestion, relevant_lines_start is {relevant_lines_start}")
return False
if relevant_lines_end<relevant_lines_start:
if settings.config.verbosity_level >= 2:
logging.exception(f"Failed to publish code suggestion, "
f"relevant_lines_end is {relevant_lines_end} and "
f"relevant_lines_start is {relevant_lines_start}")
return False
try:
import github.PullRequestComment
if relevant_lines_end > relevant_lines_start:
post_parameters = {
"body": body,
"commit_id": self.last_commit_id._identity,
"path": relevant_file,
"line": relevant_lines_end,
"start_line": relevant_lines_start,
"start_side": "RIGHT",
}
else: # API is different for single line comments
post_parameters = {
"body": body,
"commit_id": self.last_commit_id._identity,
"path": relevant_file,
"line": relevant_lines_start,
"side": "RIGHT",
}
headers, data = self.pr._requester.requestJsonAndCheck(
"POST", f"{self.pr.url}/comments", input=post_parameters
)
github.PullRequestComment.PullRequestComment(
self.pr._requester, headers, data, completed=True
)
return True
except Exception as e:
if settings.config.verbosity_level >= 2:
logging.error(f"Failed to publish code suggestion, error: {e}")
return False
def remove_initial_comment(self): def remove_initial_comment(self):
try: try:
for comment in self.pr.comments_list: for comment in self.pr.comments_list:
@ -65,53 +138,15 @@ class GithubProvider:
return self.pr.body return self.pr.body
def get_languages(self): def get_languages(self):
return self._get_repo().get_languages() languages = self._get_repo().get_languages()
return languages
def get_main_pr_language(self) -> str:
"""
Get the main language of the commit. Return an empty string if cannot determine.
"""
main_language_str = ""
try:
languages = self.get_languages()
top_language = max(languages, key=languages.get).lower()
# validate that the specific commit uses the main language
extension_list = []
files = self.pr.get_files()
for file in files:
extension_list.append(file.filename.rsplit('.')[-1])
# get the most common extension
most_common_extension = max(set(extension_list), key=extension_list.count)
# look for a match. TBD: add more languages, do this systematically
if most_common_extension == 'py' and top_language == 'python' or \
most_common_extension == 'js' and top_language == 'javascript' or \
most_common_extension == 'ts' and top_language == 'typescript' or \
most_common_extension == 'go' and top_language == 'go' or \
most_common_extension == 'java' and top_language == 'java' or \
most_common_extension == 'c' and top_language == 'c' or \
most_common_extension == 'cpp' and top_language == 'c++' or \
most_common_extension == 'cs' and top_language == 'c#' or \
most_common_extension == 'swift' and top_language == 'swift' or \
most_common_extension == 'php' and top_language == 'php' or \
most_common_extension == 'rb' and top_language == 'ruby' or \
most_common_extension == 'rs' and top_language == 'rust' or \
most_common_extension == 'scala' and top_language == 'scala' or \
most_common_extension == 'kt' and top_language == 'kotlin' or \
most_common_extension == 'pl' and top_language == 'perl' or \
most_common_extension == 'swift' and top_language == 'swift':
main_language_str = top_language
except Exception:
pass
return main_language_str
def get_pr_branch(self): def get_pr_branch(self):
return self.pr.head.ref return self.pr.head.ref
def get_pr_description(self):
return self.pr.body
def get_user_id(self): def get_user_id(self):
if not self.github_user_id: if not self.github_user_id:
try: try:
@ -188,9 +223,9 @@ class GithubProvider:
def _get_pr(self): def _get_pr(self):
return self._get_repo().get_pull(self.pr_num) return self._get_repo().get_pull(self.pr_num)
def _get_pr_file_content(self, file: FilePatchInfo, sha: str): def _get_pr_file_content(self, file: FilePatchInfo, sha: str) -> str:
try: try:
file_content_str = self._get_repo().get_contents(file.filename, ref=sha).decoded_content.decode() file_content_str = str(self._get_repo().get_contents(file.filename, ref=sha).decoded_content.decode())
except Exception: except Exception:
file_content_str = "" file_content_str = ""
return file_content_str return file_content_str

View File

@ -0,0 +1,207 @@
import logging
import re
from typing import Optional, Tuple
from urllib.parse import urlparse
import gitlab
from pr_agent.config_loader import settings
from .git_provider import FilePatchInfo, GitProvider, EDIT_TYPE
class GitLabProvider(GitProvider):
def __init__(self, merge_request_url: Optional[str] = None):
gitlab_url = settings.get("GITLAB.URL", None)
if not gitlab_url:
raise ValueError("GitLab URL is not set in the config file")
gitlab_access_token = settings.get("GITLAB.PERSONAL_ACCESS_TOKEN", None)
if not gitlab_access_token:
raise ValueError("GitLab personal access token is not set in the config file")
self.gl = gitlab.Gitlab(
gitlab_url,
gitlab_access_token
)
self.id_project = None
self.id_mr = None
self.mr = None
self.diff_files = None
self.temp_comments = []
self._set_merge_request(merge_request_url)
@property
def pr(self):
'''The GitLab terminology is merge request (MR) instead of pull request (PR)'''
return self.mr
def _set_merge_request(self, merge_request_url: str):
self.id_project, self.id_mr = self._parse_merge_request_url(merge_request_url)
self.mr = self._get_merge_request()
self.last_diff = self.mr.diffs.list()[-1]
def _get_pr_file_content(self, file_path: str, branch: str) -> str:
return self.gl.projects.get(self.id_project).files.get(file_path, branch).decode()
def get_diff_files(self) -> list[FilePatchInfo]:
diffs = self.mr.changes()['changes']
diff_files = []
for diff in diffs:
original_file_content_str = self._get_pr_file_content(diff['old_path'], self.mr.target_branch)
new_file_content_str = self._get_pr_file_content(diff['new_path'], self.mr.source_branch)
edit_type = EDIT_TYPE.MODIFIED
if diff['new_file']:
edit_type = EDIT_TYPE.ADDED
elif diff['deleted_file']:
edit_type = EDIT_TYPE.DELETED
elif diff['renamed_file']:
edit_type = EDIT_TYPE.RENAMED
try:
original_file_content_str = bytes.decode(original_file_content_str, 'utf-8')
new_file_content_str = bytes.decode(new_file_content_str, 'utf-8')
except UnicodeDecodeError:
logging.warning(
f"Cannot decode file {diff['old_path']} or {diff['new_path']} in merge request {self.id_mr}")
diff_files.append(
FilePatchInfo(original_file_content_str, new_file_content_str, diff['diff'], diff['new_path'],
edit_type=edit_type,
old_filename=None if diff['old_path'] == diff['new_path'] else diff['old_path']))
self.diff_files = diff_files
return diff_files
def get_files(self):
return [change['new_path'] for change in self.mr.changes()['changes']]
def publish_description(self, pr_title: str, pr_body: str):
logging.exception("Not implemented yet")
pass
def publish_comment(self, mr_comment: str, is_temporary: bool = False):
comment = self.mr.notes.create({'body': mr_comment})
if is_temporary:
self.temp_comments.append(comment)
def publish_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str):
self.diff_files = self.diff_files if self.diff_files else self.get_diff_files()
edit_type, found, source_line_no, target_file, target_line_no = self.search_line(relevant_file,
relevant_line_in_file)
if not found:
logging.info(f"Could not find position for {relevant_file} {relevant_line_in_file}")
else:
if edit_type == 'addition':
position = target_line_no - 1
else:
position = source_line_no - 1
d = self.last_diff
pos_obj = {'position_type': 'text',
'new_path': target_file.filename,
'old_path': target_file.old_filename if target_file.old_filename else target_file.filename,
'base_sha': d.base_commit_sha, 'start_sha': d.start_commit_sha, 'head_sha': d.head_commit_sha}
if edit_type == 'deletion':
pos_obj['old_line'] = position
elif edit_type == 'addition':
pos_obj['new_line'] = position
else:
pos_obj['new_line'] = position
pos_obj['old_line'] = position
self.mr.discussions.create({'body': body,
'position': pos_obj})
def publish_code_suggestion(self, body: str,
relevant_file: str,
relevant_lines_start: int,
relevant_lines_end: int):
raise "not implemented yet for gitlab"
def search_line(self, relevant_file, relevant_line_in_file):
RE_HUNK_HEADER = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
target_file = None
source_line_no = 0
target_line_no = 0
found = False
edit_type = self.get_edit_type(relevant_line_in_file)
for file in self.diff_files:
if file.filename == relevant_file:
target_file = file
patch = file.patch
patch_lines = patch.splitlines()
for i, line in enumerate(patch_lines):
if line.startswith('@@'):
match = RE_HUNK_HEADER.match(line)
if not match:
continue
start_old, size_old, start_new, size_new, _ = match.groups()
source_line_no = int(start_old)
target_line_no = int(start_new)
continue
if line.startswith('-'):
source_line_no += 1
elif line.startswith('+'):
target_line_no += 1
elif line.startswith(' '):
source_line_no += 1
target_line_no += 1
if relevant_line_in_file in line:
found = True
edit_type = self.get_edit_type(line)
break
elif relevant_line_in_file[0] == '+' and relevant_line_in_file[1:] in line:
# The model often adds a '+' to the beginning of the relevant_line_in_file even if originally
# it's a context line
found = True
edit_type = self.get_edit_type(line)
break
return edit_type, found, source_line_no, target_file, target_line_no
def get_edit_type(self, relevant_line_in_file):
edit_type = 'context'
if relevant_line_in_file[0] == '-':
edit_type = 'deletion'
elif relevant_line_in_file[0] == '+':
edit_type = 'addition'
return edit_type
def remove_initial_comment(self):
try:
for comment in self.temp_comments:
comment.delete()
except Exception as e:
logging.exception(f"Failed to remove temp comments, error: {e}")
def get_title(self):
return self.mr.title
def get_description(self):
return self.mr.description
def get_languages(self):
languages = self.gl.projects.get(self.id_project).languages()
return languages
def get_pr_branch(self):
return self.mr.source_branch
def get_pr_description(self):
return self.mr.description
def _parse_merge_request_url(self, merge_request_url: str) -> Tuple[int, int]:
parsed_url = urlparse(merge_request_url)
path_parts = parsed_url.path.strip('/').split('/')
if path_parts[-2] != 'merge_requests':
raise ValueError("The provided URL does not appear to be a GitLab merge request URL")
try:
mr_id = int(path_parts[-1])
except ValueError as e:
raise ValueError("Unable to convert merge request ID to integer") from e
# Gitlab supports access by both project numeric ID as well as 'namespace/project_name'
return "/".join(path_parts[:2]), mr_id
def _get_merge_request(self):
mr = self.gl.projects.get(self.id_project).mergerequests.get(self.id_mr)
return mr
def get_user_id(self):
return None

View File

@ -0,0 +1,73 @@
import asyncio
import json
import os
import re
from pr_agent.config_loader import settings
from pr_agent.tools.pr_code_suggestions import PRCodeSuggestions
from pr_agent.tools.pr_description import PRDescription
from pr_agent.tools.pr_questions import PRQuestions
from pr_agent.tools.pr_reviewer import PRReviewer
async def run_action():
GITHUB_EVENT_NAME = os.environ.get('GITHUB_EVENT_NAME', None)
if not GITHUB_EVENT_NAME:
print("GITHUB_EVENT_NAME not set")
return
GITHUB_EVENT_PATH = os.environ.get('GITHUB_EVENT_PATH', None)
if not GITHUB_EVENT_PATH:
print("GITHUB_EVENT_PATH not set")
return
try:
event_payload = json.load(open(GITHUB_EVENT_PATH, 'r'))
except json.decoder.JSONDecodeError as e:
print(f"Failed to parse JSON: {e}")
return
OPENAI_KEY = os.environ.get('OPENAI_KEY', None)
if not OPENAI_KEY:
print("OPENAI_KEY not set")
return
OPENAI_ORG = os.environ.get('OPENAI_ORG', None)
GITHUB_TOKEN = os.environ.get('GITHUB_TOKEN', None)
if not GITHUB_TOKEN:
print("GITHUB_TOKEN not set")
return
settings.set("OPENAI.KEY", OPENAI_KEY)
if OPENAI_ORG:
settings.set("OPENAI.ORG", OPENAI_ORG)
settings.set("GITHUB.USER_TOKEN", GITHUB_TOKEN)
settings.set("GITHUB.DEPLOYMENT_TYPE", "user")
if GITHUB_EVENT_NAME == "pull_request":
action = event_payload.get("action", None)
if action in ["opened", "reopened"]:
pr_url = event_payload.get("pull_request", {}).get("url", None)
if pr_url:
await PRReviewer(pr_url).review()
elif GITHUB_EVENT_NAME == "issue_comment":
action = event_payload.get("action", None)
if action in ["created", "edited"]:
comment_body = event_payload.get("comment", {}).get("body", None)
if comment_body:
pr_url = event_payload.get("issue", {}).get("pull_request", {}).get("url", None)
if pr_url:
body = comment_body.strip().lower()
if any(cmd in body for cmd in ["/review", "/review_pr"]):
await PRReviewer(pr_url).review()
elif any(cmd in body for cmd in ["/describe", "/describe_pr"]):
await PRDescription(pr_url).describe()
elif any(cmd in body for cmd in ["/improve", "/improve_code"]):
await PRCodeSuggestions(pr_url).suggest()
elif any(cmd in body for cmd in ["/ask", "/ask_question"]):
pattern = r'(/ask|/ask_question)\s*(.*)'
matches = re.findall(pattern, comment_body, re.IGNORECASE)
if matches:
question = matches[0][1]
await PRQuestions(pr_url, question).answer()
else:
print(f"Unknown command: {body}")
if __name__ == '__main__':
asyncio.run(run_action())

View File

@ -35,7 +35,8 @@ async def handle_github_webhooks(request: Request, response: Response):
async def handle_request(body): async def handle_request(body):
action = body.get("action", None) action = body.get("action", None)
installation_id = body.get("installation", {}).get("id", None) installation_id = body.get("installation", {}).get("id", None)
agent = PRAgent(installation_id) settings.set("GITHUB.INSTALLATION_ID", installation_id)
agent = PRAgent()
if action == 'created': if action == 'created':
if "comment" not in body: if "comment" not in body:
return {} return {}
@ -55,7 +56,7 @@ async def handle_request(body):
api_url = pull_request.get("url", None) api_url = pull_request.get("url", None)
if api_url is None: if api_url is None:
return {} return {}
await agent.handle_request(api_url, "please review") await agent.handle_request(api_url, "/review")
else: else:
return {} return {}
@ -66,8 +67,8 @@ async def root():
def start(): def start():
if settings.get("GITHUB.DEPLOYMENT_TYPE", "user") != "app": # Override the deployment type to app
raise Exception("Please set deployment type to app in .secrets.toml file") settings.set("GITHUB.DEPLOYMENT_TYPE", "app")
app = FastAPI() app = FastAPI()
app.include_router(router) app.include_router(router)

View File

@ -1,5 +1,6 @@
import asyncio import asyncio
import logging import logging
import re
import sys import sys
from datetime import datetime, timezone from datetime import datetime, timezone
@ -8,6 +9,11 @@ import aiohttp
from pr_agent.agent.pr_agent import PRAgent from pr_agent.agent.pr_agent import PRAgent
from pr_agent.config_loader import settings from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider from pr_agent.git_providers import get_git_provider
from pr_agent.servers.help import bot_help_text
from pr_agent.tools.pr_code_suggestions import PRCodeSuggestions
from pr_agent.tools.pr_description import PRDescription
from pr_agent.tools.pr_questions import PRQuestions
from pr_agent.tools.pr_reviewer import PRReviewer
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
NOTIFICATION_URL = "https://api.github.com/notifications" NOTIFICATION_URL = "https://api.github.com/notifications"
@ -25,6 +31,7 @@ async def polling_loop():
last_modified = [None] last_modified = [None]
git_provider = get_git_provider()() git_provider = get_git_provider()()
user_id = git_provider.get_user_id() user_id = git_provider.get_user_id()
agent = PRAgent()
try: try:
deployment_type = settings.github.deployment_type deployment_type = settings.github.deployment_type
token = settings.github.user_token token = settings.github.user_token
@ -38,6 +45,7 @@ async def polling_loop():
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
while True: while True:
try: try:
await asyncio.sleep(5)
headers = { headers = {
"Accept": "application/vnd.github.v3+json", "Accept": "application/vnd.github.v3+json",
"Authorization": f"Bearer {token}" "Authorization": f"Bearer {token}"
@ -75,21 +83,25 @@ async def polling_loop():
if comment['user']['login'] == user_id: if comment['user']['login'] == user_id:
continue continue
comment_body = comment['body'] if 'body' in comment else '' comment_body = comment['body'] if 'body' in comment else ''
commenter_github_user = comment['user']['login'] if 'user' in comment else '' commenter_github_user = comment['user']['login'] \
if 'user' in comment else ''
logging.info(f"Commenter: {commenter_github_user}\nComment: {comment_body}") logging.info(f"Commenter: {commenter_github_user}\nComment: {comment_body}")
user_tag = "@" + user_id user_tag = "@" + user_id
if user_tag not in comment_body: if user_tag not in comment_body:
continue continue
rest_of_comment = comment_body.split(user_tag)[1].strip() rest_of_comment = comment_body.split(user_tag)[1].strip()
agent = PRAgent()
await agent.handle_request(pr_url, rest_of_comment) success = await agent.handle_request(pr_url, rest_of_comment)
if not success:
git_provider.set_pr(pr_url)
git_provider.publish_comment("### How to user PR-Agent\n" +
bot_help_text(user_id))
elif response.status != 304: elif response.status != 304:
print(f"Failed to fetch notifications. Status code: {response.status}") print(f"Failed to fetch notifications. Status code: {response.status}")
await asyncio.sleep(5)
except Exception as e: except Exception as e:
logging.error(f"Exception during processing of a notification: {e}") logging.error(f"Exception during processing of a notification: {e}")
await asyncio.sleep(5)
if __name__ == '__main__': if __name__ == '__main__':
asyncio.run(polling_loop()) asyncio.run(polling_loop())

View File

@ -0,0 +1,64 @@
import asyncio
import time
import gitlab
from pr_agent.agent.pr_agent import PRAgent
from pr_agent.config_loader import settings
gl = gitlab.Gitlab(
settings.get("GITLAB.URL"),
private_token=settings.get("GITLAB.PERSONAL_ACCESS_TOKEN")
)
# Set the list of projects to monitor
projects_to_monitor = settings.get("GITLAB.PROJECTS_TO_MONITOR")
magic_word = settings.get("GITLAB.MAGIC_WORD")
# Hold the previous seen comments
previous_comments = set()
def check_comments():
print('Polling')
new_comments = {}
for project in projects_to_monitor:
project = gl.projects.get(project)
merge_requests = project.mergerequests.list(state='opened')
for mr in merge_requests:
notes = mr.notes.list(get_all=True)
for note in notes:
if note.id not in previous_comments and note.body.startswith(magic_word):
new_comments[note.id] = dict(
body=note.body[len(magic_word):],
project=project.name,
mr=mr
)
previous_comments.add(note.id)
print(f"New comment in project {project.name}, merge request {mr.title}: {note.body}")
return new_comments
def handle_new_comments(new_comments):
print('Handling new comments')
agent = PRAgent()
for _, comment in new_comments.items():
print(f"Handling comment: {comment['body']}")
asyncio.run(agent.handle_request(comment['mr'].web_url, comment['body']))
def run():
assert settings.get('CONFIG.GIT_PROVIDER') == 'gitlab', 'This script is only for GitLab'
# Initial run to populate previous_comments
check_comments()
# Run the check every minute
while True:
time.sleep(settings.get("GITLAB.POLLING_INTERVAL_SECONDS"))
new_comments = check_comments()
if new_comments:
handle_new_comments(new_comments)
if __name__ == '__main__':
run()

14
pr_agent/servers/help.py Normal file
View File

@ -0,0 +1,14 @@
commands_text = "> /review - Ask for a new review after your update the PR\n" \
"> /describe - Modify the PR title and description based " \
"on the PR's contents.\n" \
"> /improve - Suggest improvements to the code in the PR as pull " \
"request comments ready to commit.\n" \
"> /ask <QUESTION> - Ask a question about the PR.\n"
def bot_help_text(user: str):
return f"> Tag me in a comment '@{user}' and add one of the following commands:\n" + commands_text
actions_help_text = "> Add a comment to to invoke PR-Agent, use one of the following commands:\n" + \
commands_text

View File

@ -1,5 +1,5 @@
# QUICKSTART: # QUICKSTART:
# Copy this file to .secrets in the same folder. # Copy this file to .secrets.toml in the same folder.
# The minimum workable settings - set openai.key to your API key. # The minimum workable settings - set openai.key to your API key.
# Set github.deployment_type to "user" and github.user_token to your GitHub personal access token. # Set github.deployment_type to "user" and github.user_token to your GitHub personal access token.
# This will allow you to run the CLI scripts in the scripts/ folder and the github_polling server. # This will allow you to run the CLI scripts in the scripts/ folder and the github_polling server.
@ -9,11 +9,13 @@
[openai] [openai]
key = "<API_KEY>" # Acquire through https://platform.openai.com key = "<API_KEY>" # Acquire through https://platform.openai.com
org = "<ORGANIZATION>" # Optional, may be commented out. org = "<ORGANIZATION>" # Optional, may be commented out.
# Uncomment the following for Azure OpenAI
#api_type = "azure"
#api_version = '2023-05-15' # Check Azure documentation for the current API version
#api_base = "<API_BASE>" # The base URL for your Azure OpenAI resource. e.g. "https://<your resource name>.openai.azure.com"
#deployment_id = "<DEPLOYMENT_ID>" # The deployment name you chose when you deployed the engine
[github] [github]
# The type of deployment to create. Valid values are 'app' or 'user'.
deployment_type = "user"
# ---- Set the following only for deployment type == "user" # ---- Set the following only for deployment type == "user"
user_token = "<TOKEN>" # A GitHub personal access token with 'repo' scope. user_token = "<TOKEN>" # A GitHub personal access token with 'repo' scope.
@ -25,3 +27,8 @@ private_key = """\
""" """
app_id = 123456 # The GitHub App ID, replace with your own. app_id = 123456 # The GitHub App ID, replace with your own.
webhook_secret = "<WEBHOOK SECRET>" # Optional, may be commented out. webhook_secret = "<WEBHOOK SECRET>" # Optional, may be commented out.
[gitlab]
# Gitlab personal access token
personal_access_token = ""

View File

@ -5,11 +5,30 @@ publish_review=true
verbosity_level=0 # 0,1,2 verbosity_level=0 # 0,1,2
[pr_reviewer] [pr_reviewer]
require_minimal_and_focused_review=true require_focused_review=true
require_tests_review=true require_tests_review=true
require_security_review=true require_security_review=true
extended_code_suggestions=false num_code_suggestions=3
num_code_suggestions=4 inline_code_comments = true
[pr_questions] [pr_questions]
[pr_code_suggestions]
num_code_suggestions=4
[github]
# The type of deployment to create. Valid values are 'app' or 'user'.
deployment_type = "user"
[gitlab]
# URL to the gitlab service
url = "https://gitlab.com"
# Polling (either project id or namespace/project_name) syntax can be used
projects_to_monitor = ['org_name/repo_name']
# Polling trigger
magic_word = "AutoReview"
# Polling interval
polling_interval_seconds = 30

View File

@ -0,0 +1,79 @@
[pr_code_suggestions_prompt]
system="""You are a language model called CodiumAI-PR-Code-Reviewer.
Your task is to provide provide meaningfull non-trivial code suggestions to improve the new code in a PR (the '+' lines).
- Try to give important suggestions like fixing code problems, issues and bugs. As a second priority, provide suggestions for meaningfull code improvements, like performance, vulnerability, modularity, and best practices.
- Suggestions should refer only to the 'new hunk' code, and focus on improving the new added code lines, with '+'.
- Provide the exact line number range (inclusive) for each issue.
- Assume there is additional code in the relevant file that is not included in the diff.
- Provide up to {{ num_code_suggestions }} code suggestions.
- Make sure not to provide suggestion repeating modifications already implemented in the new PR code (the '+' lines).
- Don't output line numbers in the 'improved code' snippets.
You must use the following JSON schema to format your answer:
```json
{
"Code suggestions": {
"type": "array",
"minItems": 1,
"maxItems": {{ num_code_suggestions }},
"uniqueItems": "true",
"items": {
"relevant file": {
"type": "string",
"description": "the relevant file full path"
},
"suggestion content": {
"type": "string",
"description": "a concrete suggestion for meaningfully improving the new PR code."
},
"existing code": {
"type": "string",
"description": "a code snippet showing authentic relevant code lines from a 'new hunk' section. It must be continuous, correctly formatted and indented, and without line numbers."
},
"relevant lines": {
"type": "string",
"description": "the relevant lines in the 'new hunk' sections, in the format of 'start_line-end_line'. For example: '10-15'. They should be derived from the hunk line numbers, and correspond to the 'existing code' snippet above."
},
"improved code": {
"type": "string",
"description": "a new code snippet that can be used to replace the relevant lines in 'new hunk' code. Replacement suggestions should be complete, correctly formatted and indented, and without line numbers."
}
}
}
}
```
Example input:
'
## src/file1.py
---new_hunk---
```
[new hunk code, annotated with line numbers]
```
---old_hunk---
```
[old hunk code]
```
...
'
Don't repeat the prompt in the answer, and avoid outputting the 'type' and 'description' fields.
"""
user="""PR Info:
Title: '{{title}}'
Branch: '{{branch}}'
Description: '{{description}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
The PR Diff:
```
{{diff}}
```
Response (should be a valid JSON, and nothing else):
```json
"""

View File

@ -0,0 +1,45 @@
[pr_description_prompt]
system="""You are CodiumAI-PR-Reviewer, a language model designed to review git pull requests.
Your task is to provide full description of the PR content.
- Make sure not to focus the new PR code (the '+' lines).
You must use the following JSON schema to format your answer:
```json
{
"PR Title": {
"type": "string",
"description": "an informative title for the PR, describing its main theme"
},
"Type of PR": {
"type": "string",
"enum": ["Bug fix", "Tests", "Bug fix with tests", "Refactoring", "Enhancement", "Documentation", "Other"]
},
"PR Description": {
"type": "string",
"description": "an informative and concise description of the PR"
},
"PR Main Files Walkthrough": {
"type": "string",
"description": "a walkthrough of the PR changes. Review main files, in bullet points, and shortly describe the changes in each file (up to 10 most important files). Format: -`filename`: description of changes\n..."
}
}
Don't repeat the prompt in the answer, and avoid outputting the 'type' and 'description' fields.
"""
user="""PR Info:
Branch: '{{branch}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
The PR Git Diff:
```
{{diff}}
```
Note that lines in the diff body are prefixed with a symbol that represents the type of change: '-' for deletions, '+' for additions, and ' ' (a space) for unchanged lines.
Response (should be a valid JSON, and nothing else):
```json
"""

View File

@ -1,9 +1,9 @@
[pr_questions_prompt] [pr_questions_prompt]
system="""You are CodiumAI-PR-Reviewer, a language model designed to review git pull requests. system="""You are CodiumAI-PR-Reviewer, a language model designed to review git pull requests.
Your task is to answer questions about the new PR code (the '+' lines), and provide feedback. Your task is to answer questions about the new PR code (the '+' lines), and provide feedback.
Be informative, constructive, and give examples. Try to be as specific as possible, and don't avoid answering the questions. Be informative, constructive, and give examples. Try to be as specific as possible.
Don't avoid answering the questions. You must answer the questions, as best as you can, without adding unrelated content.
Make sure not to repeat modifications already implemented in the new PR code (the '+' lines). Make sure not to repeat modifications already implemented in the new PR code (the '+' lines).
Answer only the questions, and don't add unrelated content.
""" """
user="""PR Info: user="""PR Info:

View File

@ -3,9 +3,6 @@ system="""You are CodiumAI-PR-Reviewer, a language model designed to review git
Your task is to provide constructive and concise feedback for the PR, and also provide meaningfull code suggestions to improve the new PR code (the '+' lines). Your task is to provide constructive and concise feedback for the PR, and also provide meaningfull code suggestions to improve the new PR code (the '+' lines).
- Provide up to {{ num_code_suggestions }} code suggestions. - Provide up to {{ num_code_suggestions }} code suggestions.
- Try to focus on important suggestions like fixing code problems, issues and bugs. As a second priority, provide suggestions for meaningfull code improvements, like performance, vulnerability, modularity, and best practices. - Try to focus on important suggestions like fixing code problems, issues and bugs. As a second priority, provide suggestions for meaningfull code improvements, like performance, vulnerability, modularity, and best practices.
{%- if extended_code_suggestions %}
- For each suggestion, provide a short and concise code snippet to illustrate the existing code, and the improved code.
{%- endif %}
- Make sure not to provide suggestion repeating modifications already implemented in the new PR code (the '+' lines). - Make sure not to provide suggestion repeating modifications already implemented in the new PR code (the '+' lines).
You must use the following JSON schema to format your answer: You must use the following JSON schema to format your answer:
@ -16,10 +13,6 @@ You must use the following JSON schema to format your answer:
"type": "string", "type": "string",
"description": "a short explanation of the PR" "description": "a short explanation of the PR"
}, },
"Description and title": {
"type": "string",
"description": "yes\\no question: does this PR have a relevant description and title"
},
"Type of PR": { "Type of PR": {
"type": "string", "type": "string",
"enum": ["Bug fix", "Tests", "Bug fix with tests", "Refactoring", "Enhancement", "Documentation", "Other"] "enum": ["Bug fix", "Tests", "Bug fix with tests", "Refactoring", "Enhancement", "Documentation", "Other"]
@ -30,59 +23,36 @@ You must use the following JSON schema to format your answer:
"description": "yes\\no question: does this PR have relevant tests ?" "description": "yes\\no question: does this PR have relevant tests ?"
}, },
{%- endif %} {%- endif %}
{%- if require_minimal_and_focused %} {%- if require_focused %}
"Minimal and focused": { "Focused PR": {
"type": "string", "type": "string",
"description": "is this PR as minimal and focused as possible, with all code changes centered around a single coherent theme, described in the PR description and title ?" Make sure to explain your answer" "description": "Is this a focused PR, in the sense that it has a clear and coherent title and description, and all PR code diff changes are properly derived from the title and description? Explain your response."
} }
}, },
{%- endif %} {%- endif %}
"PR Feedback": { "PR Feedback": {
"General PR suggestions": { "General PR suggestions": {
"type": "string", "type": "string",
"description": "important suggestions for the contributors and maintainers of this PR, may include overall structure, primary purpose and best practices. consider using specific filenames, classes and functions names. explain yourself!" "description": "General suggestions and feedback for the contributors and maintainers of this PR. May include important suggestions for the overall structure, primary purpose, best practices, critical bugs, and other aspects of the PR. Explain your suggestions."
}, },
"Code suggestions": { "Code suggestions": {
"type": "array", "type": "array",
"maxItems": {{ num_code_suggestions }}, "maxItems": {{ num_code_suggestions }},
"uniqueItems": true, "uniqueItems": true,
"items": { "items": {
"suggestion number": {
"type": "int",
"description": "suggestion number, starting from 1"
},
"relevant file": { "relevant file": {
"type": "string", "type": "string",
"description": "the relevant file name" "description": "the relevant file full path"
}, },
"suggestion content": { "suggestion content": {
"type": "string", "type": "string",
{%- if extended_code_suggestions %}
"description": "a concrete suggestion for meaningfully improving the new PR code. Don't repeat previous suggestions. Add tags with importance measure that matches each suggestion ('important' or 'medium'). Do not make suggestions for updating or adding docstrings, renaming PR title and description, or linter like.
{%- else %}
"description": "a concrete suggestion for meaningfully improving the new PR code. Also describe how, specifically, the suggestion can be applied to new PR code. Add tags with importance measure that matches each suggestion ('important' or 'medium'). Do not make suggestions for updating or adding docstrings, renaming PR title and description, or linter like. "description": "a concrete suggestion for meaningfully improving the new PR code. Also describe how, specifically, the suggestion can be applied to new PR code. Add tags with importance measure that matches each suggestion ('important' or 'medium'). Do not make suggestions for updating or adding docstrings, renaming PR title and description, or linter like.
{%- endif %}
}, },
{%- if extended_code_suggestions %} "relevant line in file": {
"why": {
"type": "string", "type": "string",
"description": "shortly explain why this suggestion is important" "description": "an authentic single code line from the PR git diff section, to which the suggestion applies."
},
"code example": {
"type": "object",
"properties": {
"before code": {
"type": "string",
"description": "Short and concise code snippet, to illustrate the existing code"
},
"after code": {
"type": "string",
"description": "Short and concise code snippet, to illustrate the improved code"
} }
} }
}
{%- endif %}
}
}, },
{%- if require_security %} {%- if require_security %}
"Security concerns": { "Security concerns": {
@ -101,13 +71,12 @@ Example output:
"PR Analysis": "PR Analysis":
{ {
"Main theme": "xxx", "Main theme": "xxx",
"Description and title": "Yes",
"Type of PR": "Bug fix", "Type of PR": "Bug fix",
{%- if require_tests %} {%- if require_tests %}
"Relevant tests added": "No", "Relevant tests added": "No",
{%- endif %} {%- endif %}
{%- if require_minimal_and_focused %} {%- if require_focused %}
"Minimal and focused": "yes\\no, because ..." "Focused PR": "yes\\no, because ..."
{%- endif %} {%- endif %}
}, },
"PR Feedback": "PR Feedback":
@ -115,17 +84,9 @@ Example output:
"General PR suggestions": "..., `xxx`...", "General PR suggestions": "..., `xxx`...",
"Code suggestions": [ "Code suggestions": [
{ {
"suggestion number": 1, "relevant file": "directory/xxx.py",
"relevant file": "xxx.py",
"suggestion content": "xxx [important]", "suggestion content": "xxx [important]",
{%- if extended_code_suggestions %} "relevant line in file": "xxx",
"why": "xxx",
"code example":
{
"before code": "xxx",
"after code": "xxx"
}
{%- endif %}
}, },
... ...
] ]

View File

@ -0,0 +1,127 @@
import copy
import json
import logging
import textwrap
from jinja2 import Environment, StrictUndefined
from pr_agent.algo.ai_handler import AiHandler
from pr_agent.algo.pr_processing import get_pr_diff
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.algo.utils import convert_to_markdown, try_fix_json
from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider, GithubProvider
from pr_agent.git_providers.git_provider import get_main_pr_language
class PRCodeSuggestions:
def __init__(self, pr_url: str, cli_mode=False):
self.git_provider = get_git_provider()(pr_url)
self.main_language = get_main_pr_language(
self.git_provider.get_languages(), self.git_provider.get_files()
)
self.ai_handler = AiHandler()
self.patches_diff = None
self.prediction = None
self.cli_mode = cli_mode
self.vars = {
"title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.get_pr_description(),
"language": self.main_language,
"diff": "", # empty diff for initial calculation
'num_code_suggestions': settings.pr_code_suggestions.num_code_suggestions,
}
self.token_handler = TokenHandler(self.git_provider.pr,
self.vars,
settings.pr_code_suggestions_prompt.system,
settings.pr_code_suggestions_prompt.user)
async def suggest(self):
assert type(self.git_provider) == GithubProvider, "Only Github is supported for now"
logging.info('Generating code suggestions for PR...')
if settings.config.publish_review:
self.git_provider.publish_comment("Preparing review...", is_temporary=True)
logging.info('Getting PR diff...')
# we are using extended hunk with line numbers for code suggestions
self.patches_diff = get_pr_diff(self.git_provider,
self.token_handler,
add_line_numbers_to_hunks=True,
disable_extra_lines=True)
logging.info('Getting AI prediction...')
self.prediction = await self._get_prediction()
logging.info('Preparing PR review...')
data = self._prepare_pr_code_suggestions()
if settings.config.publish_review:
logging.info('Pushing PR review...')
self.git_provider.remove_initial_comment()
logging.info('Pushing inline code comments...')
self.push_inline_code_suggestions(data)
async def _get_prediction(self):
variables = copy.deepcopy(self.vars)
variables["diff"] = self.patches_diff # update diff
environment = Environment(undefined=StrictUndefined)
system_prompt = environment.from_string(settings.pr_code_suggestions_prompt.system).render(variables)
user_prompt = environment.from_string(settings.pr_code_suggestions_prompt.user).render(variables)
if settings.config.verbosity_level >= 2:
logging.info(f"\nSystem prompt:\n{system_prompt}")
logging.info(f"\nUser prompt:\n{user_prompt}")
model = settings.config.model
response, finish_reason = await self.ai_handler.chat_completion(model=model, temperature=0.2,
system=system_prompt, user=user_prompt)
return response
def _prepare_pr_code_suggestions(self) -> str:
review = self.prediction.strip()
data = None
try:
data = json.loads(review)
except json.decoder.JSONDecodeError:
if settings.config.verbosity_level >= 2:
logging.info(f"Could not parse json response: {review}")
data = try_fix_json(review)
return data
def push_inline_code_suggestions(self, data):
for d in data['Code suggestions']:
if settings.config.verbosity_level >= 2:
logging.info(f"suggestion: {d}")
relevant_file = d['relevant file'].strip()
relevant_lines_str = d['relevant lines'].strip()
relevant_lines_start = int(relevant_lines_str.split('-')[0]) # absolute position
relevant_lines_end = int(relevant_lines_str.split('-')[-1])
content = d['suggestion content']
existing_code_snippet = d['existing code']
new_code_snippet = d['improved code']
if new_code_snippet:
try: # dedent code snippet
self.diff_files = self.git_provider.diff_files if self.git_provider.diff_files else self.git_provider.get_diff_files()
original_initial_line = None
for file in self.diff_files:
if file.filename.strip() == relevant_file:
original_initial_line = file.head_file.splitlines()[relevant_lines_start - 1]
break
if original_initial_line:
suggested_initial_line = new_code_snippet.splitlines()[0]
original_initial_spaces = len(original_initial_line) - len(original_initial_line.lstrip())
suggested_initial_spaces = len(suggested_initial_line) - len(suggested_initial_line.lstrip())
delta_spaces = original_initial_spaces - suggested_initial_spaces
if delta_spaces > 0:
new_code_snippet = textwrap.indent(new_code_snippet, delta_spaces * " ").rstrip('\n')
except Exception as e:
if settings.config.verbosity_level >= 2:
logging.info(f"Could not dedent code snippet for file {relevant_file}, error: {e}")
body = f"**Suggestion:** {content}\n```suggestion\n" + new_code_snippet + "\n```"
success = self.git_provider.publish_code_suggestion(body=body,
relevant_file=relevant_file,
relevant_lines_start=relevant_lines_start,
relevant_lines_end=relevant_lines_end)

View File

@ -0,0 +1,83 @@
import copy
import json
import logging
from jinja2 import Environment, StrictUndefined
from pr_agent.algo.ai_handler import AiHandler
from pr_agent.algo.pr_processing import get_pr_diff
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.algo.utils import convert_to_markdown
from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider
from pr_agent.git_providers.git_provider import get_main_pr_language
class PRDescription:
def __init__(self, pr_url: str):
self.git_provider = get_git_provider()(pr_url)
self.main_pr_language = get_main_pr_language(
self.git_provider.get_languages(), self.git_provider.get_files()
)
self.ai_handler = AiHandler()
self.vars = {
"title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.get_description(),
"language": self.main_pr_language,
"diff": "", # empty diff for initial calculation
}
self.token_handler = TokenHandler(self.git_provider.pr,
self.vars,
settings.pr_description_prompt.system,
settings.pr_description_prompt.user)
self.patches_diff = None
self.prediction = None
async def describe(self):
logging.info('Generating a PR description...')
if settings.config.publish_review:
self.git_provider.publish_comment("Preparing pr description...", is_temporary=True)
logging.info('Getting PR diff...')
self.patches_diff = get_pr_diff(self.git_provider, self.token_handler)
logging.info('Getting AI prediction...')
self.prediction = await self._get_prediction()
logging.info('Preparing answer...')
pr_title, pr_body = self._prepare_pr_answer()
if settings.config.publish_review:
logging.info('Pushing answer...')
self.git_provider.publish_description(pr_title, pr_body)
self.git_provider.remove_initial_comment()
return ""
async def _get_prediction(self):
variables = copy.deepcopy(self.vars)
variables["diff"] = self.patches_diff # update diff
environment = Environment(undefined=StrictUndefined)
system_prompt = environment.from_string(settings.pr_description_prompt.system).render(variables)
user_prompt = environment.from_string(settings.pr_description_prompt.user).render(variables)
if settings.config.verbosity_level >= 2:
logging.info(f"\nSystem prompt:\n{system_prompt}")
logging.info(f"\nUser prompt:\n{user_prompt}")
model = settings.config.model
response, finish_reason = await self.ai_handler.chat_completion(model=model, temperature=0.2,
system=system_prompt, user=user_prompt)
return response
def _prepare_pr_answer(self):
data = json.loads(self.prediction)
pr_body = ""
# for key, value in data.items():
# markdown_text += f"## {key}\n\n"
# markdown_text += f"{value}\n\n"
title = data['PR Title']
del data['PR Title']
for key, value in data.items():
pr_body += f"{key}:\n"
if 'walkthrough' in key.lower():
pr_body += f"{value}\n"
else:
pr_body += f"**{value}**\n\n___\n"
if settings.config.verbosity_level >= 2:
logging.info(f"title:\n{title}\n{pr_body}")
return title, pr_body

View File

@ -1,6 +1,5 @@
import copy import copy
import logging import logging
from typing import Optional
from jinja2 import Environment, StrictUndefined from jinja2 import Environment, StrictUndefined
@ -9,20 +8,22 @@ from pr_agent.algo.pr_processing import get_pr_diff
from pr_agent.algo.token_handler import TokenHandler from pr_agent.algo.token_handler import TokenHandler
from pr_agent.config_loader import settings from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider from pr_agent.git_providers import get_git_provider
from pr_agent.git_providers.git_provider import get_main_pr_language
class PRQuestions: class PRQuestions:
def __init__(self, pr_url: str, question_str: str, installation_id: Optional[int] = None): def __init__(self, pr_url: str, question_str: str):
self.git_provider = get_git_provider()(pr_url, installation_id) self.git_provider = get_git_provider()(pr_url)
self.main_pr_language = self.git_provider.get_main_pr_language() self.main_pr_language = get_main_pr_language(
self.installation_id = installation_id self.git_provider.get_languages(), self.git_provider.get_files()
)
self.ai_handler = AiHandler() self.ai_handler = AiHandler()
self.question_str = question_str self.question_str = question_str
self.vars = { self.vars = {
"title": self.git_provider.pr.title, "title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(), "branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.pr.body, "description": self.git_provider.get_description(),
"language": self.git_provider.get_main_pr_language(), "language": self.main_pr_language,
"diff": "", # empty diff for initial calculation "diff": "", # empty diff for initial calculation
"questions": self.question_str, "questions": self.question_str,
} }

View File

@ -1,24 +1,26 @@
import copy import copy
import json import json
import logging import logging
from typing import Optional
from jinja2 import Environment, StrictUndefined from jinja2 import Environment, StrictUndefined
from pr_agent.algo.ai_handler import AiHandler from pr_agent.algo.ai_handler import AiHandler
from pr_agent.algo.pr_processing import get_pr_diff from pr_agent.algo.pr_processing import get_pr_diff
from pr_agent.algo.token_handler import TokenHandler from pr_agent.algo.token_handler import TokenHandler
from pr_agent.algo.utils import convert_to_markdown from pr_agent.algo.utils import convert_to_markdown, try_fix_json
from pr_agent.config_loader import settings from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider from pr_agent.git_providers import get_git_provider
from pr_agent.git_providers.git_provider import get_main_pr_language
from pr_agent.servers.help import bot_help_text, actions_help_text
class PRReviewer: class PRReviewer:
def __init__(self, pr_url: str, installation_id: Optional[int] = None, cli_mode=False): def __init__(self, pr_url: str, cli_mode=False):
self.git_provider = get_git_provider()(pr_url, installation_id) self.git_provider = get_git_provider()(pr_url)
self.main_language = self.git_provider.get_main_pr_language() self.main_language = get_main_pr_language(
self.installation_id = installation_id self.git_provider.get_languages(), self.git_provider.get_files()
)
self.ai_handler = AiHandler() self.ai_handler = AiHandler()
self.patches_diff = None self.patches_diff = None
self.prediction = None self.prediction = None
@ -26,13 +28,12 @@ class PRReviewer:
self.vars = { self.vars = {
"title": self.git_provider.pr.title, "title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(), "branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.pr.body, "description": self.git_provider.get_pr_description(),
"language": self.main_language, "language": self.main_language,
"diff": "", # empty diff for initial calculation "diff": "", # empty diff for initial calculation
"require_tests": settings.pr_reviewer.require_tests_review, "require_tests": settings.pr_reviewer.require_tests_review,
"require_security": settings.pr_reviewer.require_security_review, "require_security": settings.pr_reviewer.require_security_review,
"require_minimal_and_focused": settings.pr_reviewer.require_minimal_and_focused_review, "require_focused": settings.pr_reviewer.require_focused_review,
'extended_code_suggestions': settings.pr_reviewer.extended_code_suggestions,
'num_code_suggestions': settings.pr_reviewer.num_code_suggestions, 'num_code_suggestions': settings.pr_reviewer.num_code_suggestions,
} }
self.token_handler = TokenHandler(self.git_provider.pr, self.token_handler = TokenHandler(self.git_provider.pr,
@ -54,6 +55,9 @@ class PRReviewer:
logging.info('Pushing PR review...') logging.info('Pushing PR review...')
self.git_provider.publish_comment(pr_comment) self.git_provider.publish_comment(pr_comment)
self.git_provider.remove_initial_comment() self.git_provider.remove_initial_comment()
if settings.pr_reviewer.inline_code_comments:
logging.info('Pushing inline code comments...')
self._publish_inline_code_comments()
return "" return ""
async def _get_prediction(self): async def _get_prediction(self):
@ -68,11 +72,7 @@ class PRReviewer:
model = settings.config.model model = settings.config.model
response, finish_reason = await self.ai_handler.chat_completion(model=model, temperature=0.2, response, finish_reason = await self.ai_handler.chat_completion(model=model, temperature=0.2,
system=system_prompt, user=user_prompt) system=system_prompt, user=user_prompt)
try:
json.loads(response)
except json.decoder.JSONDecodeError:
logging.warning("Could not decode JSON")
response = {}
return response return response
def _prepare_pr_review(self) -> str: def _prepare_pr_review(self) -> str:
@ -80,8 +80,7 @@ class PRReviewer:
try: try:
data = json.loads(review) data = json.loads(review)
except json.decoder.JSONDecodeError: except json.decoder.JSONDecodeError:
logging.error("Unable to decode JSON response from AI") data = try_fix_json(review)
data = {}
# reordering for nicer display # reordering for nicer display
if 'PR Feedback' in data: if 'PR Feedback' in data:
@ -90,21 +89,33 @@ class PRReviewer:
del data['PR Feedback']['Security concerns'] del data['PR Feedback']['Security concerns']
data['PR Analysis']['Security concerns'] = val data['PR Analysis']['Security concerns'] = val
if settings.config.git_provider == 'github' and settings.pr_reviewer.inline_code_comments:
del data['PR Feedback']['Code suggestions']
markdown_text = convert_to_markdown(data) markdown_text = convert_to_markdown(data)
user = self.git_provider.get_user_id() user = self.git_provider.get_user_id()
if not self.cli_mode: if not self.cli_mode:
markdown_text += "\n### How to use\n" markdown_text += "\n### How to use\n"
if user and '[bot]' not in user: if user and '[bot]' not in user:
markdown_text += f"> Tag me in a comment '@{user}' to ask for a new review after you update the PR.\n" markdown_text += bot_help_text(user)
markdown_text += "> You can also tag me and ask any question, " \
f"for example '@{user} is the PR ready for merge?'"
else: else:
markdown_text += "> Add a comment that says 'review' to ask for a new review " \ markdown_text += actions_help_text
"after you update the PR.\n"
markdown_text += "> You can also add a comment that says 'answer QUESTION', " \
"for example 'answer is the PR ready for merge?'"
if settings.config.verbosity_level >= 2: if settings.config.verbosity_level >= 2:
logging.info(f"Markdown response:\n{markdown_text}") logging.info(f"Markdown response:\n{markdown_text}")
return markdown_text return markdown_text
def _publish_inline_code_comments(self):
review = self.prediction.strip()
try:
data = json.loads(review)
except json.decoder.JSONDecodeError:
data = try_fix_json(review)
for d in data['PR Feedback']['Code suggestions']:
relevant_file = d['relevant file'].strip()
relevant_line_in_file = d['relevant line in file'].strip()
content = d['suggestion content']
self.git_provider.publish_inline_comment(content, relevant_file, relevant_line_in_file)

View File

@ -6,3 +6,6 @@ openai==0.27.8
Jinja2==3.1.2 Jinja2==3.1.2
tiktoken==0.4.0 tiktoken==0.4.0
uvicorn==0.22.0 uvicorn==0.22.0
python-gitlab==3.15.0
pytest~=7.4.0
aiohttp~=3.8.4

View File

@ -46,22 +46,19 @@ class TestConvertToMarkdown:
def test_simple_dictionary_input(self): def test_simple_dictionary_input(self):
input_data = { input_data = {
'Main theme': 'Test', 'Main theme': 'Test',
'Description and title': 'Test description',
'Type of PR': 'Test type', 'Type of PR': 'Test type',
'Relevant tests added': 'no', 'Relevant tests added': 'no',
'Unrelated changes': 'n/a', # won't be included in the output 'Unrelated changes': 'n/a', # won't be included in the output
'Minimal and focused': 'Yes', 'Focused PR': 'Yes',
'General PR suggestions': 'general suggestion...', 'General PR suggestions': 'general suggestion...',
'Code suggestions': [ 'Code suggestions': [
{ {
'Suggestion number': 1,
'Code example': { 'Code example': {
'Before': 'Code before', 'Before': 'Code before',
'After': 'Code after' 'After': 'Code after'
} }
}, },
{ {
'Suggestion number': 2,
'Code example': { 'Code example': {
'Before': 'Code before 2', 'Before': 'Code before 2',
'After': 'Code after 2' 'After': 'Code after 2'
@ -71,15 +68,13 @@ class TestConvertToMarkdown:
} }
expected_output = """\ expected_output = """\
- 🎯 **Main theme:** Test - 🎯 **Main theme:** Test
- 🔍 **Description and title:** Test description
- 📌 **Type of PR:** Test type - 📌 **Type of PR:** Test type
- 🧪 **Relevant tests added:** no - 🧪 **Relevant tests added:** no
- ✨ **Minimal and focused:** Yes - ✨ **Focused PR:** Yes
- 💡 **General PR suggestions:** general suggestion... - 💡 **General PR suggestions:** general suggestion...
- 🤖 **Code suggestions:** - 🤖 **Code suggestions:**
- **suggestion 1:**
- **Code example:** - **Code example:**
- **Before:** - **Before:**
``` ```
@ -90,7 +85,6 @@ class TestConvertToMarkdown:
Code after Code after
``` ```
- **suggestion 2:**
- **Code example:** - **Code example:**
- **Before:** - **Before:**
``` ```
@ -112,11 +106,10 @@ class TestConvertToMarkdown:
def test_dictionary_input_containing_only_empty_dictionaries(self): def test_dictionary_input_containing_only_empty_dictionaries(self):
input_data = { input_data = {
'Main theme': {}, 'Main theme': {},
'Description and title': {},
'Type of PR': {}, 'Type of PR': {},
'Relevant tests added': {}, 'Relevant tests added': {},
'Unrelated changes': {}, 'Unrelated changes': {},
'Minimal and focused': {}, 'Focused PR': {},
'General PR suggestions': {}, 'General PR suggestions': {},
'Code suggestions': {} 'Code suggestions': {}
} }

View File

@ -0,0 +1,83 @@
# Generated by CodiumAI
from pr_agent.algo.utils import try_fix_json
import pytest
class TestTryFixJson:
# Tests that JSON with complete 'Code suggestions' section returns expected output
def test_incomplete_code_suggestions(self):
review = '{"PR Analysis": {"Main theme": "xxx", "Type of PR": "Bug fix"}, "PR Feedback": {"General PR suggestions": "..., `xxx`...", "Code suggestions": [{"relevant file": "xxx.py", "suggestion content": "xxx [important]"}, {"suggestion number": 2, "relevant file": "yyy.py", "suggestion content": "yyy [incomp...'
expected_output = {
'PR Analysis': {
'Main theme': 'xxx',
'Type of PR': 'Bug fix'
},
'PR Feedback': {
'General PR suggestions': '..., `xxx`...',
'Code suggestions': [
{
'relevant file': 'xxx.py',
'suggestion content': 'xxx [important]'
}
]
}
}
assert try_fix_json(review) == expected_output
def test_incomplete_code_suggestions_new_line(self):
review = '{"PR Analysis": {"Main theme": "xxx", "Type of PR": "Bug fix"}, "PR Feedback": {"General PR suggestions": "..., `xxx`...", "Code suggestions": [{"relevant file": "xxx.py", "suggestion content": "xxx [important]"} \n\t, {"suggestion number": 2, "relevant file": "yyy.py", "suggestion content": "yyy [incomp...'
expected_output = {
'PR Analysis': {
'Main theme': 'xxx',
'Type of PR': 'Bug fix'
},
'PR Feedback': {
'General PR suggestions': '..., `xxx`...',
'Code suggestions': [
{
'relevant file': 'xxx.py',
'suggestion content': 'xxx [important]'
}
]
}
}
assert try_fix_json(review) == expected_output
def test_incomplete_code_suggestions_many_close_brackets(self):
review = '{"PR Analysis": {"Main theme": "xxx", "Type of PR": "Bug fix"}, "PR Feedback": {"General PR suggestions": "..., `xxx`...", "Code suggestions": [{"relevant file": "xxx.py", "suggestion content": "xxx [important]"} \n, {"suggestion number": 2, "relevant file": "yyy.py", "suggestion content": "yyy }, [}\n ,incomp.} ,..'
expected_output = {
'PR Analysis': {
'Main theme': 'xxx',
'Type of PR': 'Bug fix'
},
'PR Feedback': {
'General PR suggestions': '..., `xxx`...',
'Code suggestions': [
{
'relevant file': 'xxx.py',
'suggestion content': 'xxx [important]'
}
]
}
}
assert try_fix_json(review) == expected_output
def test_incomplete_code_suggestions_relevant_file(self):
review = '{"PR Analysis": {"Main theme": "xxx", "Type of PR": "Bug fix"}, "PR Feedback": {"General PR suggestions": "..., `xxx`...", "Code suggestions": [{"relevant file": "xxx.py", "suggestion content": "xxx [important]"}, {"suggestion number": 2, "relevant file": "yyy.p'
expected_output = {
'PR Analysis': {
'Main theme': 'xxx',
'Type of PR': 'Bug fix'
},
'PR Feedback': {
'General PR suggestions': '..., `xxx`...',
'Code suggestions': [
{
'relevant file': 'xxx.py',
'suggestion content': 'xxx [important]'
}
]
}
}
assert try_fix_json(review) == expected_output

View File

@ -1,15 +1,15 @@
# Generated by CodiumAI # Generated by CodiumAI
from pr_agent.algo.language_handler import sort_files_by_main_languages from pr_agent.algo.language_handler import sort_files_by_main_languages
import pytest
""" """
Code Analysis Code Analysis
Objective: Objective:
The objective of the function is to sort a list of files by their main language, putting the files that are in the main language first and the rest of the files after. It takes in a dictionary of languages and their sizes, and a list of files. The objective of the function is to sort a list of files by their main language, putting the files that are in the main
language first and the rest of the files after. It takes in a dictionary of languages and their sizes, and a list of
files.
Inputs: Inputs:
- languages: a dictionary containing the languages and their sizes - languages: a dictionary containing the languages and their sizes
@ -33,6 +33,8 @@ Additional aspects:
- The function uses the filter_bad_extensions function to filter out files with bad extensions - The function uses the filter_bad_extensions function to filter out files with bad extensions
- The function uses a rest_files dictionary to store the files that do not belong to any of the main extensions - The function uses a rest_files dictionary to store the files that do not belong to any of the main extensions
""" """
class TestSortFilesByMainLanguages: class TestSortFilesByMainLanguages:
# Tests that files are sorted by main language, with files in main language first and the rest after # Tests that files are sorted by main language, with files in main language first and the rest after
def test_happy_path_sort_files_by_main_languages(self): def test_happy_path_sort_files_by_main_languages(self):

View File

@ -41,14 +41,6 @@ class TestParseCodeSuggestion:
expected_output = "\n" # modified to expect a newline character expected_output = "\n" # modified to expect a newline character
assert parse_code_suggestion(input_data) == expected_output assert parse_code_suggestion(input_data) == expected_output
# Tests that function returns correct output when 'suggestion number' key has a non-integer value
def test_non_integer_suggestion_number(self):
input_data = {
"Suggestion number": "one",
"Description": "This is a suggestion"
}
expected_output = "- **suggestion one:**\n - **Description:** This is a suggestion\n\n"
assert parse_code_suggestion(input_data) == expected_output
# Tests that function returns correct output when 'before' or 'after' key has a non-string value # Tests that function returns correct output when 'before' or 'after' key has a non-string value
def test_non_string_before_or_after(self): def test_non_string_before_or_after(self):
@ -64,19 +56,17 @@ class TestParseCodeSuggestion:
# Tests that function returns correct output when input dictionary does not have 'code example' key # Tests that function returns correct output when input dictionary does not have 'code example' key
def test_no_code_example_key(self): def test_no_code_example_key(self):
code_suggestions = { code_suggestions = {
'suggestion number': 1,
'suggestion': 'Suggestion 1', 'suggestion': 'Suggestion 1',
'description': 'Description 1', 'description': 'Description 1',
'before': 'Before 1', 'before': 'Before 1',
'after': 'After 1' 'after': 'After 1'
} }
expected_output = "- **suggestion 1:**\n - **suggestion:** Suggestion 1\n - **description:** Description 1\n - **before:** Before 1\n - **after:** After 1\n\n" # noqa: E501 expected_output = " **suggestion:** Suggestion 1\n **description:** Description 1\n **before:** Before 1\n **after:** After 1\n\n" # noqa: E501
assert parse_code_suggestion(code_suggestions) == expected_output assert parse_code_suggestion(code_suggestions) == expected_output
# Tests that function returns correct output when input dictionary has 'code example' key # Tests that function returns correct output when input dictionary has 'code example' key
def test_with_code_example_key(self): def test_with_code_example_key(self):
code_suggestions = { code_suggestions = {
'suggestion number': 2,
'suggestion': 'Suggestion 2', 'suggestion': 'Suggestion 2',
'description': 'Description 2', 'description': 'Description 2',
'code example': { 'code example': {
@ -84,5 +74,5 @@ class TestParseCodeSuggestion:
'after': 'After 2' 'after': 'After 2'
} }
} }
expected_output = "- **suggestion 2:**\n - **suggestion:** Suggestion 2\n - **description:** Description 2\n - **code example:**\n - **before:**\n ```\n Before 2\n ```\n - **after:**\n ```\n After 2\n ```\n\n" # noqa: E501 expected_output = " **suggestion:** Suggestion 2\n **description:** Description 2\n - **code example:**\n - **before:**\n ```\n Before 2\n ```\n - **after:**\n ```\n After 2\n ```\n\n" # noqa: E501
assert parse_code_suggestion(code_suggestions) == expected_output assert parse_code_suggestion(code_suggestions) == expected_output