Compare commits

..

35 Commits

Author SHA1 Message Date
8f751f7371 Default timeout for AI is now 180s, configurable 2023-08-07 13:26:28 +03:00
43297b851f Merge pull request #177 from Codium-ai/tr/update_readme
Update README and CONFIGURATION Documentation
2023-08-07 09:26:12 +03:00
4f39239e73 readme update
readme update
2023-08-07 09:11:54 +03:00
00e1925927 Merge pull request #172 from krrishdholakia/patch-1
adding support for Anthropic, Cohere, Replicate, Azure
2023-08-06 18:38:36 +03:00
7189b3ab41 suggestions -> feedback 2023-08-06 18:20:39 +03:00
a00038fbd8 Merge remote-tracking branch 'origin/main' into patch-1 2023-08-06 18:09:09 +03:00
a45343793a Merge pull request #175 from Codium-ai/tr/review_adjustments
Making the 'Review' Feature Great Again
2023-08-06 12:14:43 +03:00
703215fe83 updating secrets template 2023-08-05 22:53:59 -07:00
0f975ccf4a bug fixes 2023-08-05 22:50:41 -07:00
7367c62cf9 TestFindLineNumberOfRelevantLineInFile 2023-08-06 08:31:15 +03:00
fed0ea349a find_line_number_of_relevant_line_in_file
find_line_number_of_relevant_line_in_file
2023-08-06 08:13:07 +03:00
bd86266a4b Merge pull request #173 from Codium-ai/tr/caching
Optimization of PR Diff Processing
2023-08-05 09:23:45 +03:00
bd07a0cd7f Update Configuration.md 2023-08-04 12:13:04 +03:00
ed8554699b bug fixes and updates 2023-08-03 16:05:46 -07:00
749ae1be79 Update CHANGELOG.md 2023-08-03 19:55:51 +00:00
0e3dbbd0f2 fix major bug in gitlab 2023-08-03 22:51:38 +03:00
7a57db5d88 load_large_diff is done once 2023-08-03 22:14:05 +03:00
102edcdcf1 adding support for Anthropic, Cohere, Replicate, Azure 2023-08-03 12:04:08 -07:00
c92648cbd5 caching 2023-08-03 21:38:18 +03:00
26b008565b Merge pull request #170 from Codium-ai/tr/edge_case_for_hunks
Handling edge case for hunks in git patch processing
2023-08-03 12:11:27 +03:00
0dec24aa37 edge case for hunks 2023-08-03 10:50:22 +03:00
68a2f2a27d fix requirement.txt 2023-08-03 10:19:51 +03:00
cfa14178f8 Merge pull request #168 from Codium-ai/tr/further_use_commit_messages
Use commit messages in PR tools
2023-08-03 07:58:25 +03:00
b97c4b6114 Update CHANGELOG.md 2023-08-02 18:36:34 +03:00
3d43cecbea Merge pull request #167 from zmeir/zmeir-list_configurations_as_comment
Add /config command to list the possible configuration settings
2023-08-02 18:35:20 +03:00
eb143ec851 Update CHANGELOG.md 2023-08-02 15:32:15 +00:00
3e94a71dcd commit_messages_str is used in all tools 2023-08-02 18:26:39 +03:00
dd14423b07 Add /config command to list the possible configuration settings 2023-08-02 16:42:54 +03:00
8e47fdc284 Merge pull request #164 from Codium-ai/ok/repo_config
Support for Repo-Specific Configuration File
2023-08-01 19:09:23 +03:00
ab607d74be Support repo-specific configuration file 2023-08-01 18:36:20 +03:00
bfe7304449 Support repo-specific configuration file 2023-08-01 18:04:52 +03:00
e12874b696 Support repo-specific configuration file 2023-08-01 17:44:08 +03:00
696e2bd6ff Support repo-specific configuration file 2023-08-01 17:27:25 +03:00
450f410e3c Support repo-specific configuration file 2023-08-01 17:22:03 +03:00
08a3f033cb Merge pull request #162 from Codium-ai/ok/settings_refactor
Refactor settings usage and CLI
2023-08-01 16:05:20 +03:00
31 changed files with 600 additions and 182 deletions

View File

@ -1,9 +1,27 @@
## 2023-08-03
### Optimized
- Optimized PR diff processing by introducing caching for diff files, reducing the number of API calls.
- Refactored `load_large_diff` function to generate a patch only when necessary.
- Fixed a bug in the GitLab provider where the new file was not retrieved correctly.
## 2023-08-02
### Enhanced
- Updated several tools in the `pr_agent` package to use commit messages in their functionality.
- Commit messages are now retrieved and stored in the `vars` dictionary for each tool.
- Added a section to display the commit messages in the prompts of various tools.
## 2023-08-01
### Enhanced
- Introduced the ability to retrieve commit messages from pull requests across different git providers.
- Implemented commit messages retrieval for GitHub and GitLab providers.
- Updated the PR description template to include a section for commit messages if they exist.
- Added support for repository-specific configuration files (.pr_agent.yaml) for the PR Agent.
- Implemented this feature for both GitHub and GitLab providers.
- Added a new configuration option 'use_repo_settings_file' to enable or disable the use of a repo-specific settings file.
## 2023-07-30

View File

@ -1,12 +1,57 @@
## Configuration
The different tools and sub-tools used by CodiumAI pr-agent are adjustable via the configuration file: `/pr-agent/settings/configuration.toml`.
The different tools and sub-tools used by CodiumAI PR-Agent are adjustable via the **[configuration file](pr_agent/settings/configuration.toml)**
### Working from CLI
When running from source (CLI), your local configuration file will be initially used.
Example for invoking the 'review' tools via the CLI:
To edit the configuration of any tool, just add `--config_path=<value>` to you command.
For example if you want to edit online the `pr_reviewer` configurations, you can run:
```
/review --pr_reviewer.extra_instructions="focus on the file xyz" --pr_reviewer.require_score_review=false ...
python cli.py --pr-url=<pr_url> review
```
In addition to general configurations, the 'review' tool will use parameters from the `[pr_reviewer]` section (every tool has a dedicated section in the configuration file).
Note that you can print results locally, without publishing them, by setting in `configuration.toml`:
```
[config]
publish_output=true
verbosity_level=2
```
This is useful for debugging or experimenting with the different tools.
### Working from pre-built repo (GitHub Action/GitHub App/Docker)
When running PR-Agent from a pre-built repo, the default configuration file will be loaded.
To edit the configuration, you have two options:
1. Place a local configuration file in the root of your local repo. The local file will be used instead of the default one.
2. For online usage, just add `--config_path=<value>` to you command, to edit a specific configuration value.
For example if you want to edit `pr_reviewer` configurations, you can run:
```
/review --pr_reviewer.extra_instructions="..." --pr_reviewer.require_score_review=false ...
```
Any configuration value in `configuration.toml` file can be similarly edited.
### General configuration parameters
#### Changing a model
See [here](pr_agent/algo/__init__.py) for the list of available models.
To use Llama2 model, for example, set:
```
[config]
model = "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1"
[replicate]
key = ...
```
(you can obtain a Llama2 key from [here](https://replicate.com/replicate/llama-2-70b-chat/api))
Also review the [AiHandler](pr_agent/algo/ai_handler.py) file for instruction how to set keys for other models.
#### Extra instructions
All PR-Agent tools have a parameter called `extra_instructions`, that enables to add free-text extra instructions. Example usage:
```
/update_changelog --pr_update_changelog.extra_instructions="Make sure to update also the version ..."
```

View File

@ -65,7 +65,6 @@ CodiumAI `PR-Agent` is an open-source tool aiming to help developers review pull
- [Overview](#overview)
- [Try it now](#try-it-now)
- [Installation](#installation)
- [Usage and tools](#usage-and-tools)
- [Configuration](./CONFIGURATION.md)
- [How it works](#how-it-works)
- [Why use PR-Agent](#why-use-pr-agent)
@ -94,6 +93,7 @@ CodiumAI `PR-Agent` is an open-source tool aiming to help developers review pull
| CORE | PR compression | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| | Repo language prioritization | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| | Adaptive and token-aware<br />file patch fitting | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| | Multiple models support | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| | Incremental PR Review | :white_check_mark: | | |
Examples for invoking the different tools via the CLI:
@ -135,19 +135,11 @@ There are several ways to use PR-Agent:
- [Method 5: Run as a GitHub App](INSTALL.md#method-5-run-as-a-github-app)
- Allowing you to automate the review process on your private or public repositories
## Usage and Tools
**PR-Agent** provides six types of interactions ("tools"): `"PR Reviewer"`, `"PR Q&A"`, `"PR Description"`, `"PR Code Sueggestions"`, `"PR Reflect and Review"` and `"PR Update Changlog"`.
- The "PR Reviewer" tool automatically analyzes PRs, and provides various types of feedback.
- The "PR Q&A" tool answers free-text questions about the PR.
- The "PR Description" tool automatically sets the PR Title and body.
- The "PR Code Suggestion" tool provide inline code suggestions for the PR that can be applied and committed.
- The "PR Reflect and Review" tool initiates a dialog with the user, asks them to reflect on the PR, and then provides a more focused review.
- The "PR Update Changelog" tool automatically updates the CHANGELOG.md file with the PR changes.
## How it works
The following diagram illustrates PR-Agent tools and their flow:
![PR-Agent Tools](https://www.codium.ai/wp-content/uploads/2023/07/codiumai-diagram-v4.jpg)
Check out the [PR Compression strategy](./PR_COMPRESSION.md) page for more details on how we convert a code diff to a manageable LLM prompt
@ -156,29 +148,28 @@ Check out the [PR Compression strategy](./PR_COMPRESSION.md) page for more detai
A reasonable question that can be asked is: `"Why use PR-Agent? What make it stand out from existing tools?"`
Here are some of the reasons why:
Here are some advantages of PR-Agent:
- We emphasize **real-life practical usage**. Each tool (review, improve, ask, ...) has a single GPT-4 call, no more. We feel that this is critical for realistic team usage - obtaining an answer quickly (~30 seconds) and affordably.
- Our [PR Compression strategy](./PR_COMPRESSION.md) is a core ability that enables to effectively tackle both short and long PRs.
- Our JSON prompting strategy enables to have **modular, customizable tools**. For example, the '/review' tool categories can be controlled via the configuration file. Adding additional categories is easy and accessible.
- We support **multiple git providers** (GitHub, Gitlab, Bitbucket), and multiple ways to use the tool (CLI, GitHub Action, GitHub App, Docker, ...).
- Our JSON prompting strategy enables to have **modular, customizable tools**. For example, the '/review' tool categories can be controlled via the [configuration](./CONFIGURATION.md) file. Adding additional categories is easy and accessible.
- We support **multiple git providers** (GitHub, Gitlab, Bitbucket), **multiple ways** to use the tool (CLI, GitHub Action, GitHub App, Docker, ...), and **multiple models** (GPT-4, GPT-3.5, Anthropic, Cohere, Llama2).
- We are open-source, and welcome contributions from the community.
## Roadmap
- [ ] Support open-source models, as a replacement for OpenAI models. (Note - a minimal requirement for each open-source model is to have 8k+ context, and good support for generating JSON as an output)
- [x] Support other Git providers, such as Gitlab and Bitbucket.
- [ ] Develop additional logic for handling large PRs, and compressing git patches
- [x] Support additional models, as a replacement for OpenAI (see [here](https://github.com/Codium-ai/pr-agent/pull/172))
- [ ] Develop additional logic for handling large PRs
- [ ] Add additional context to the prompt. For example, repo (or relevant files) summarization, with tools such a [ctags](https://github.com/universal-ctags/ctags)
- [ ] Adding more tools. Possible directions:
- [x] PR description
- [x] Inline code suggestions
- [x] Reflect and review
- [x] Rank the PR (see [here](https://github.com/Codium-ai/pr-agent/pull/89))
- [ ] Enforcing CONTRIBUTING.md guidelines
- [ ] Performance (are there any performance issues)
- [ ] Documentation (is the PR properly documented)
- [ ] Rank the PR importance
- [ ] ...
## Similar Projects

View File

@ -1,13 +1,18 @@
import logging
import os
import shlex
import tempfile
from pr_agent.algo.utils import update_settings_from_args
from pr_agent.config_loader import get_settings
from pr_agent.git_providers import get_git_provider
from pr_agent.tools.pr_code_suggestions import PRCodeSuggestions
from pr_agent.tools.pr_description import PRDescription
from pr_agent.tools.pr_information_from_user import PRInformationFromUser
from pr_agent.tools.pr_questions import PRQuestions
from pr_agent.tools.pr_reviewer import PRReviewer
from pr_agent.tools.pr_update_changelog import PRUpdateChangelog
from pr_agent.tools.pr_config import PRConfig
command2class = {
"answer": PRReviewer,
@ -22,6 +27,8 @@ command2class = {
"ask": PRQuestions,
"ask_question": PRQuestions,
"update_changelog": PRUpdateChangelog,
"config": PRConfig,
"settings": PRConfig,
}
commands = list(command2class.keys())
@ -31,11 +38,31 @@ class PRAgent:
pass
async def handle_request(self, pr_url, request) -> bool:
# First, apply repo specific settings if exists
if get_settings().config.use_repo_settings_file:
repo_settings_file = None
try:
git_provider = get_git_provider()(pr_url)
repo_settings = git_provider.get_repo_settings()
if repo_settings:
repo_settings_file = None
fd, repo_settings_file = tempfile.mkstemp(suffix='.toml')
os.write(fd, repo_settings)
get_settings().load_file(repo_settings_file)
finally:
if repo_settings_file:
try:
os.remove(repo_settings_file)
except Exception as e:
logging.error(f"Failed to remove temporary settings file {repo_settings_file}", e)
# Then, apply user specific settings if exists
request = request.replace("'", "\\'")
lexer = shlex.shlex(request, posix=True)
lexer.whitespace_split = True
action, *args = list(lexer)
args = update_settings_from_args(args)
action = action.lstrip("/").lower()
if action == "reflect_and_review" and not get_settings().pr_reviewer.ask_and_reflect:
action = "review"

View File

@ -7,4 +7,8 @@ MAX_TOKENS = {
'gpt-4': 8000,
'gpt-4-0613': 8000,
'gpt-4-32k': 32000,
'claude-instant-1': 100000,
'claude-2': 100000,
'command-nightly': 4096,
'replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1': 4096,
}

View File

@ -1,12 +1,15 @@
import logging
import litellm
import openai
from litellm import acompletion
from openai.error import APIError, RateLimitError, Timeout, TryAgain
from retry import retry
from pr_agent.config_loader import get_settings
OPENAI_RETRIES=5
OPENAI_RETRIES = 5
class AiHandler:
"""
@ -22,15 +25,25 @@ class AiHandler:
"""
try:
openai.api_key = get_settings().openai.key
litellm.openai_key = get_settings().openai.key
self.azure = False
if get_settings().get("OPENAI.ORG", None):
openai.organization = get_settings().openai.org
litellm.organization = get_settings().openai.org
self.deployment_id = get_settings().get("OPENAI.DEPLOYMENT_ID", None)
if get_settings().get("OPENAI.API_TYPE", None):
openai.api_type = get_settings().openai.api_type
if get_settings().openai.api_type == "azure":
self.azure = True
litellm.azure_key = get_settings().openai.key
if get_settings().get("OPENAI.API_VERSION", None):
openai.api_version = get_settings().openai.api_version
litellm.api_version = get_settings().openai.api_version
if get_settings().get("OPENAI.API_BASE", None):
openai.api_base = get_settings().openai.api_base
litellm.api_base = get_settings().openai.api_base
if get_settings().get("ANTHROPIC.KEY", None):
litellm.anthropic_key = get_settings().anthropic.key
if get_settings().get("COHERE.KEY", None):
litellm.cohere_key = get_settings().cohere.key
if get_settings().get("REPLICATE.KEY", None):
litellm.replicate_key = get_settings().replicate.key
except AttributeError as e:
raise ValueError("OpenAI key is required") from e
@ -57,15 +70,17 @@ class AiHandler:
TryAgain: If there is an attribute error during OpenAI inference.
"""
try:
response = await openai.ChatCompletion.acreate(
model=model,
deployment_id=self.deployment_id,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user}
],
temperature=temperature,
)
response = await acompletion(
model=model,
deployment_id=self.deployment_id,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user}
],
temperature=temperature,
azure=self.azure,
force_timeout=get_settings().config.ai_timeout
)
except (APIError, Timeout, TryAgain) as e:
logging.error("Error during OpenAI inference: ", e)
raise
@ -75,8 +90,9 @@ class AiHandler:
except (Exception) as e:
logging.error("Unknown error during OpenAI inference: ", e)
raise TryAgain from e
if response is None or len(response.choices) == 0:
if response is None or len(response["choices"]) == 0:
raise TryAgain
resp = response.choices[0]['message']['content']
finish_reason = response.choices[0].finish_reason
resp = response["choices"][0]['message']['content']
finish_reason = response["choices"][0]["finish_reason"]
print(resp, finish_reason)
return resp, finish_reason

View File

@ -41,7 +41,11 @@ def extend_patch(original_file_str, patch_str, num_lines) -> str:
extended_patch_lines.extend(
original_lines[start1 + size1 - 1:start1 + size1 - 1 + num_lines])
start1, size1, start2, size2 = map(int, match.groups()[:4])
try:
start1, size1, start2, size2 = map(int, match.groups()[:4])
except: # '@@ -0,0 +1 @@' case
start1, size1, size2 = map(int, match.groups()[:3])
start2 = 0
section_header = match.groups()[4]
extended_start1 = max(1, start1 - num_lines)
extended_size1 = size1 + (start1 - extended_start1) + num_lines
@ -198,7 +202,12 @@ def convert_to_hunks_with_lines_numbers(patch: str, file) -> str:
patch_with_lines_str += f"{line_old}\n"
new_content_lines = []
old_content_lines = []
start1, size1, start2, size2 = map(int, match.groups()[:4])
try:
start1, size1, start2, size2 = map(int, match.groups()[:4])
except: # '@@ -0,0 +1 @@' case
start1, size1, size2 = map(int, match.groups()[:3])
start2 = 0
elif line.startswith('+'):
new_content_lines.append(line)
elif line.startswith('-'):

View File

@ -1,7 +1,10 @@
from __future__ import annotations
import difflib
import logging
from typing import Callable, Tuple
import re
import traceback
from typing import Any, Callable, List, Tuple
from github import RateLimitExceededException
@ -9,9 +12,8 @@ from pr_agent.algo import MAX_TOKENS
from pr_agent.algo.git_patch_processing import convert_to_hunks_with_lines_numbers, extend_patch, handle_patch_deletions
from pr_agent.algo.language_handler import sort_files_by_main_languages
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.algo.utils import load_large_diff
from pr_agent.config_loader import get_settings
from pr_agent.git_providers.git_provider import GitProvider
from pr_agent.git_providers.git_provider import FilePatchInfo, GitProvider
DELETED_FILES_ = "Deleted files:\n"
@ -46,7 +48,7 @@ def get_pr_diff(git_provider: GitProvider, token_handler: TokenHandler, model: s
PATCH_EXTRA_LINES = 0
try:
diff_files = list(git_provider.get_diff_files())
diff_files = git_provider.get_diff_files()
except RateLimitExceededException as e:
logging.error(f"Rate limit exceeded for git provider API. original message {e}")
raise
@ -98,12 +100,7 @@ def pr_generate_extended_diff(pr_languages: list, token_handler: TokenHandler,
for lang in pr_languages:
for file in lang['files']:
original_file_content_str = file.base_file
new_file_content_str = file.head_file
patch = file.patch
# handle the case of large patch, that initially was not loaded
patch = load_large_diff(file, new_file_content_str, original_file_content_str, patch)
if not patch:
continue
@ -161,7 +158,6 @@ def pr_generate_compressed_diff(top_langs: list, token_handler: TokenHandler, mo
original_file_content_str = file.base_file
new_file_content_str = file.head_file
patch = file.patch
patch = load_large_diff(file, new_file_content_str, original_file_content_str, patch)
if not patch:
continue
@ -221,6 +217,70 @@ async def retry_with_fallback_models(f: Callable):
try:
return await f(model)
except Exception as e:
logging.warning(f"Failed to generate prediction with {model}: {e}")
logging.warning(f"Failed to generate prediction with {model}: {traceback.format_exc()}")
if i == len(all_models) - 1: # If it's the last iteration
raise # Re-raise the last exception
def find_line_number_of_relevant_line_in_file(diff_files: List[FilePatchInfo],
relevant_file: str,
relevant_line_in_file: str) -> Tuple[int, int]:
"""
Find the line number and absolute position of a relevant line in a file.
Args:
diff_files (List[FilePatchInfo]): A list of FilePatchInfo objects representing the patches of files.
relevant_file (str): The name of the file where the relevant line is located.
relevant_line_in_file (str): The content of the relevant line.
Returns:
Tuple[int, int]: A tuple containing the line number and absolute position of the relevant line in the file.
"""
position = -1
absolute_position = -1
re_hunk_header = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
for file in diff_files:
if file.filename.strip() == relevant_file:
patch = file.patch
patch_lines = patch.splitlines()
# try to find the line in the patch using difflib, with some margin of error
matches_difflib: list[str | Any] = difflib.get_close_matches(relevant_line_in_file,
patch_lines, n=3, cutoff=0.93)
if len(matches_difflib) == 1 and matches_difflib[0].startswith('+'):
relevant_line_in_file = matches_difflib[0]
delta = 0
start1, size1, start2, size2 = 0, 0, 0, 0
for i, line in enumerate(patch_lines):
if line.startswith('@@'):
delta = 0
match = re_hunk_header.match(line)
start1, size1, start2, size2 = map(int, match.groups()[:4])
elif not line.startswith('-'):
delta += 1
if relevant_line_in_file in line and line[0] != '-':
position = i
absolute_position = start2 + delta - 1
break
if position == -1 and relevant_line_in_file[0] == '+':
no_plus_line = relevant_line_in_file[1:].lstrip()
for i, line in enumerate(patch_lines):
if line.startswith('@@'):
delta = 0
match = re_hunk_header.match(line)
start1, size1, start2, size2 = map(int, match.groups()[:4])
elif not line.startswith('-'):
delta += 1
if no_plus_line in line and line[0] != '-':
# The model might add a '+' to the beginning of the relevant_line_in_file even if originally
# it's a context line
position = i
absolute_position = start2 + delta - 1
break
return position, absolute_position

View File

@ -1,5 +1,5 @@
from jinja2 import Environment, StrictUndefined
from tiktoken import encoding_for_model
from tiktoken import encoding_for_model, get_encoding
from pr_agent.config_loader import get_settings
@ -27,7 +27,7 @@ class TokenHandler:
- system: The system string.
- user: The user string.
"""
self.encoder = encoding_for_model(get_settings().config.model)
self.encoder = encoding_for_model(get_settings().config.model) if "gpt" in get_settings().config.model else get_encoding("cl100k_base")
self.prompt_tokens = self._get_system_user_tokens(pr, self.encoder, vars, system, user)
def _get_system_user_tokens(self, pr, encoder, vars: dict, system, user):
@ -47,7 +47,6 @@ class TokenHandler:
environment = Environment(undefined=StrictUndefined)
system_prompt = environment.from_string(system).render(vars)
user_prompt = environment.from_string(user).render(vars)
system_prompt_tokens = len(encoder.encode(system_prompt))
user_prompt_tokens = len(encoder.encode(user_prompt))
return system_prompt_tokens + user_prompt_tokens

View File

@ -40,7 +40,7 @@ def convert_to_markdown(output_data: dict) -> str:
"Security concerns": "🔒",
"General PR suggestions": "💡",
"Insights from user's answers": "📝",
"Code suggestions": "🤖",
"Code feedback": "🤖",
}
for key, value in output_data.items():
@ -50,12 +50,12 @@ def convert_to_markdown(output_data: dict) -> str:
markdown_text += f"## {key}\n\n"
markdown_text += convert_to_markdown(value)
elif isinstance(value, list):
if key.lower() == 'code suggestions':
if key.lower() == 'code feedback':
markdown_text += "\n" # just looks nicer with additional line breaks
emoji = emojis.get(key, "")
markdown_text += f"- {emoji} **{key}:**\n\n"
for item in value:
if isinstance(item, dict) and key.lower() == 'code suggestions':
if isinstance(item, dict) and key.lower() == 'code feedback':
markdown_text += parse_code_suggestion(item)
elif item:
markdown_text += f" - {item}\n"
@ -100,7 +100,7 @@ def try_fix_json(review, max_iter=10, code_suggestions=False):
Args:
- review: A string containing the JSON message to be fixed.
- max_iter: An integer representing the maximum number of iterations to try and fix the JSON message.
- code_suggestions: A boolean indicating whether to try and fix JSON messages with code suggestions.
- code_suggestions: A boolean indicating whether to try and fix JSON messages with code feedback.
Returns:
- data: A dictionary containing the parsed JSON data.
@ -108,7 +108,7 @@ def try_fix_json(review, max_iter=10, code_suggestions=False):
The function attempts to fix broken or incomplete JSON messages by parsing until the last valid code suggestion.
If the JSON message ends with a closing bracket, the function calls the fix_json_escape_char function to fix the
message.
If code_suggestions is True and the JSON message contains code suggestions, the function tries to fix the JSON
If code_suggestions is True and the JSON message contains code feedback, the function tries to fix the JSON
message by parsing until the last valid code suggestion.
The function uses regular expressions to find the last occurrence of "}," with any number of whitespaces or
newlines.
@ -128,7 +128,8 @@ def try_fix_json(review, max_iter=10, code_suggestions=False):
else:
closing_bracket = "]}}"
if review.rfind("'Code suggestions': [") > 0 or review.rfind('"Code suggestions": [') > 0:
if (review.rfind("'Code feedback': [") > 0 or review.rfind('"Code feedback": [') > 0) or \
(review.rfind("'Code suggestions': [") > 0 or review.rfind('"Code suggestions": [') > 0) :
last_code_suggestion_ind = [m.end() for m in re.finditer(r"\}\s*,", review)][-1] - 1
valid_json = False
iter_count = 0
@ -195,38 +196,30 @@ def convert_str_to_datetime(date_str):
return datetime.strptime(date_str, datetime_format)
def load_large_diff(file, new_file_content_str: str, original_file_content_str: str, patch: str) -> str:
def load_large_diff(filename, new_file_content_str: str, original_file_content_str: str) -> str:
"""
Generate a patch for a modified file by comparing the original content of the file with the new content provided as
input.
Args:
file: The file object for which the patch needs to be generated.
new_file_content_str: The new content of the file as a string.
original_file_content_str: The original content of the file as a string.
patch: An optional patch string that can be provided as input.
Returns:
The generated or provided patch string.
Raises:
None.
Additional Information:
- If 'patch' is not provided as input, the function generates a patch using the 'difflib' library and returns it
as output.
- If the 'settings.config.verbosity_level' is greater than or equal to 2, a warning message is logged indicating
that the file was modified but no patch was found, and a patch is manually created.
"""
if not patch: # to Do - also add condition for file extension
try:
diff = difflib.unified_diff(original_file_content_str.splitlines(keepends=True),
new_file_content_str.splitlines(keepends=True))
if get_settings().config.verbosity_level >= 2:
logging.warning(f"File was modified, but no patch was found. Manually creating patch: {file.filename}.")
patch = ''.join(diff)
except Exception:
pass
patch = ""
try:
diff = difflib.unified_diff(original_file_content_str.splitlines(keepends=True),
new_file_content_str.splitlines(keepends=True))
if get_settings().config.verbosity_level >= 2:
logging.warning(f"File was modified, but no patch was found. Manually creating patch: {filename}.")
patch = ''.join(diff)
except Exception:
pass
return patch

View File

@ -1,4 +1,6 @@
import logging
import hashlib
from datetime import datetime
from typing import Optional, Tuple
from urllib.parse import urlparse
@ -7,11 +9,12 @@ from github import AppAuthentication, Auth, Github, GithubException
from retry import retry
from starlette_context import context
from .git_provider import FilePatchInfo, GitProvider, IncrementalPR
from ..algo.language_handler import is_valid_file
from ..algo.utils import load_large_diff
from ..algo.pr_processing import find_line_number_of_relevant_line_in_file
from ..config_loader import get_settings
from ..servers.utils import RateLimitExceeded
from .git_provider import FilePatchInfo, GitProvider, IncrementalPR
class GithubProvider(GitProvider):
@ -27,6 +30,7 @@ class GithubProvider(GitProvider):
self.pr = None
self.github_user_id = None
self.diff_files = None
self.git_files = None
self.incremental = incremental
if pr_url:
self.set_pr(pr_url)
@ -81,40 +85,56 @@ class GithubProvider(GitProvider):
def get_files(self):
if self.incremental.is_incremental and self.file_set:
return self.file_set.values()
return self.pr.get_files()
if not self.git_files:
# bring files from GitHub only once
self.git_files = self.pr.get_files()
return self.git_files
@retry(exceptions=RateLimitExceeded,
tries=get_settings().github.ratelimit_retries, delay=2, backoff=2, jitter=(1, 3))
def get_diff_files(self) -> list[FilePatchInfo]:
"""
Retrieves the list of files that have been modified, added, deleted, or renamed in a pull request in GitHub,
along with their content and patch information.
Returns:
diff_files (List[FilePatchInfo]): List of FilePatchInfo objects representing the modified, added, deleted,
or renamed files in the merge request.
"""
try:
if self.diff_files:
return self.diff_files
files = self.get_files()
diff_files = []
for file in files:
if is_valid_file(file.filename):
new_file_content_str = self._get_pr_file_content(file, self.pr.head.sha)
patch = file.patch
if self.incremental.is_incremental and self.file_set:
original_file_content_str = self._get_pr_file_content(file,
self.incremental.last_seen_commit_sha)
patch = load_large_diff(file,
new_file_content_str,
original_file_content_str,
None)
self.file_set[file.filename] = patch
else:
original_file_content_str = self._get_pr_file_content(file, self.pr.base.sha)
diff_files.append(
FilePatchInfo(original_file_content_str, new_file_content_str, patch, file.filename))
for file in files:
if not is_valid_file(file.filename):
continue
new_file_content_str = self._get_pr_file_content(file, self.pr.head.sha) # communication with GitHub
patch = file.patch
if self.incremental.is_incremental and self.file_set:
original_file_content_str = self._get_pr_file_content(file, self.incremental.last_seen_commit_sha)
patch = load_large_diff(file.filename, new_file_content_str, original_file_content_str)
self.file_set[file.filename] = patch
else:
original_file_content_str = self._get_pr_file_content(file, self.pr.base.sha)
if not patch:
patch = load_large_diff(file.filename, new_file_content_str, original_file_content_str)
diff_files.append(FilePatchInfo(original_file_content_str, new_file_content_str, patch, file.filename))
self.diff_files = diff_files
return diff_files
except GithubException.RateLimitExceededException as e:
logging.error(f"Rate limit exceeded for GitHub API. Original message: {e}")
raise RateLimitExceeded("Rate limit exceeded for GitHub API.") from e
def publish_description(self, pr_title: str, pr_body: str):
self.pr.edit(title=pr_title, body=pr_body)
# self.pr.create_issue_comment(pr_comment)
def publish_comment(self, pr_comment: str, is_temporary: bool = False):
if is_temporary and not get_settings().config.publish_output_progress:
@ -131,22 +151,9 @@ class GithubProvider(GitProvider):
def publish_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str):
self.publish_inline_comments([self.create_inline_comment(body, relevant_file, relevant_line_in_file)])
def create_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str):
self.diff_files = self.diff_files if self.diff_files else self.get_diff_files()
position = -1
for file in self.diff_files:
if file.filename.strip() == relevant_file:
patch = file.patch
patch_lines = patch.splitlines()
for i, line in enumerate(patch_lines):
if relevant_line_in_file in line:
position = i
break
elif relevant_line_in_file[0] == '+' and relevant_line_in_file[1:].lstrip() in line:
# The model often adds a '+' to the beginning of the relevant_line_in_file even if originally
# it's a context line
position = i
break
position = find_line_number_of_relevant_line_in_file(self.diff_files, relevant_file.strip('`'), relevant_line_in_file)
if position == -1:
if get_settings().config.verbosity_level >= 2:
logging.info(f"Could not find position for {relevant_file} {relevant_line_in_file}")
@ -154,8 +161,6 @@ class GithubProvider(GitProvider):
else:
subject_type = "LINE"
path = relevant_file.strip()
# placeholder for future API support (already supported in single inline comment)
# return dict(body=body, path=path, position=position, subject_type=subject_type)
return dict(body=body, path=path, position=position) if subject_type == "LINE" else {}
def publish_inline_comments(self, comments: list[dict]):
@ -251,6 +256,13 @@ class GithubProvider(GitProvider):
def get_issue_comments(self):
return self.pr.get_issue_comments()
def get_repo_settings(self):
try:
contents = self.repo_obj.get_contents(".pr_agent.toml", ref=self.pr.head.sha).decoded_content
return contents
except Exception:
return ""
@staticmethod
def _parse_pr_url(pr_url: str) -> Tuple[str, int]:
parsed_url = urlparse(pr_url)
@ -360,3 +372,25 @@ class GithubProvider(GitProvider):
except:
commit_messages_str = ""
return commit_messages_str
def generate_link_to_relevant_line_number(self, suggestion) -> str:
try:
relevant_file = suggestion['relevant file']
relevant_line_str = suggestion['relevant line']
position, absolute_position = find_line_number_of_relevant_line_in_file \
(self.diff_files, relevant_file.strip('`'), relevant_line_str)
if absolute_position != -1:
# # link to right file only
# link = f"https://github.com/{self.repo}/blob/{self.pr.head.sha}/{relevant_file}" \
# + "#" + f"L{absolute_position}"
# link to diff
sha_file = hashlib.sha256(relevant_file.encode('utf-8')).hexdigest()
link = f"https://github.com/{self.repo}/pull/{self.pr_num}/files#diff-{sha_file}R{absolute_position}"
return link
except Exception as e:
if get_settings().config.verbosity_level >= 2:
logging.info(f"Failed adding line link, error: {e}")
return ""

View File

@ -7,6 +7,7 @@ import gitlab
from gitlab import GitlabGetError
from ..algo.language_handler import is_valid_file
from ..algo.utils import load_large_diff
from ..config_loader import get_settings
from .git_provider import EDIT_TYPE, FilePatchInfo, GitProvider
@ -30,6 +31,7 @@ class GitLabProvider(GitProvider):
self.id_mr = None
self.mr = None
self.diff_files = None
self.git_files = None
self.temp_comments = []
self._set_merge_request(merge_request_url)
self.RE_HUNK_HEADER = re.compile(
@ -65,19 +67,27 @@ class GitLabProvider(GitProvider):
return ''
def get_diff_files(self) -> list[FilePatchInfo]:
"""
Retrieves the list of files that have been modified, added, deleted, or renamed in a pull request in GitLab,
along with their content and patch information.
Returns:
diff_files (List[FilePatchInfo]): List of FilePatchInfo objects representing the modified, added, deleted,
or renamed files in the merge request.
"""
if self.diff_files:
return self.diff_files
diffs = self.mr.changes()['changes']
diff_files = []
for diff in diffs:
if is_valid_file(diff['new_path']):
original_file_content_str = self._get_pr_file_content(diff['old_path'], self.mr.target_branch)
new_file_content_str = self._get_pr_file_content(diff['new_path'], self.mr.source_branch)
edit_type = EDIT_TYPE.MODIFIED
if diff['new_file']:
edit_type = EDIT_TYPE.ADDED
elif diff['deleted_file']:
edit_type = EDIT_TYPE.DELETED
elif diff['renamed_file']:
edit_type = EDIT_TYPE.RENAMED
# original_file_content_str = self._get_pr_file_content(diff['old_path'], self.mr.target_branch)
# new_file_content_str = self._get_pr_file_content(diff['new_path'], self.mr.source_branch)
original_file_content_str = self._get_pr_file_content(diff['old_path'], self.mr.diff_refs['base_sha'])
new_file_content_str = self._get_pr_file_content(diff['new_path'], self.mr.diff_refs['head_sha'])
try:
if isinstance(original_file_content_str, bytes):
original_file_content_str = bytes.decode(original_file_content_str, 'utf-8')
@ -86,15 +96,33 @@ class GitLabProvider(GitProvider):
except UnicodeDecodeError:
logging.warning(
f"Cannot decode file {diff['old_path']} or {diff['new_path']} in merge request {self.id_mr}")
edit_type = EDIT_TYPE.MODIFIED
if diff['new_file']:
edit_type = EDIT_TYPE.ADDED
elif diff['deleted_file']:
edit_type = EDIT_TYPE.DELETED
elif diff['renamed_file']:
edit_type = EDIT_TYPE.RENAMED
filename = diff['new_path']
patch = diff['diff']
if not patch:
patch = load_large_diff(filename, new_file_content_str, original_file_content_str)
diff_files.append(
FilePatchInfo(original_file_content_str, new_file_content_str, diff['diff'], diff['new_path'],
FilePatchInfo(original_file_content_str, new_file_content_str,
patch=patch,
filename=filename,
edit_type=edit_type,
old_filename=None if diff['old_path'] == diff['new_path'] else diff['old_path']))
self.diff_files = diff_files
return diff_files
def get_files(self):
return [change['new_path'] for change in self.mr.changes()['changes']]
if not self.git_files:
self.git_files = [change['new_path'] for change in self.mr.changes()['changes']]
return self.git_files
def publish_description(self, pr_title: str, pr_body: str):
try:
@ -110,7 +138,6 @@ class GitLabProvider(GitProvider):
self.temp_comments.append(comment)
def publish_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str):
self.diff_files = self.diff_files if self.diff_files else self.get_diff_files()
edit_type, found, source_line_no, target_file, target_line_no = self.search_line(relevant_file,
relevant_line_in_file)
self.send_inline_comment(body, edit_type, found, relevant_file, relevant_line_in_file, source_line_no,
@ -151,9 +178,9 @@ class GitLabProvider(GitProvider):
relevant_lines_start = suggestion['relevant_lines_start']
relevant_lines_end = suggestion['relevant_lines_end']
self.diff_files = self.diff_files if self.diff_files else self.get_diff_files()
diff_files = self.get_diff_files()
target_file = None
for file in self.diff_files:
for file in diff_files:
if file.filename == relevant_file:
if file.filename == relevant_file:
target_file = file
@ -180,7 +207,7 @@ class GitLabProvider(GitProvider):
target_file = None
edit_type = self.get_edit_type(relevant_line_in_file)
for file in self.diff_files:
for file in self.get_diff_files():
if file.filename == relevant_file:
edit_type, found, source_line_no, target_file, target_line_no = self.find_in_file(file,
relevant_line_in_file)
@ -253,6 +280,13 @@ class GitLabProvider(GitProvider):
def get_issue_comments(self):
raise NotImplementedError("GitLab provider does not support issue comments yet")
def get_repo_settings(self):
try:
contents = self.gl.projects.get(self.id_project).files.get(file_path='.pr_agent.toml', ref=self.mr.source_branch)
return contents
except Exception:
return ""
def _parse_merge_request_url(self, merge_request_url: str) -> Tuple[str, int]:
parsed_url = urlparse(merge_request_url)

View File

@ -4,7 +4,8 @@ commands_text = "> **/review [-i]**: Request a review of your Pull Request. For
"> **/improve**: Suggest improvements to the code in the PR. \n" \
"> **/ask \\<QUESTION\\>**: Pose a question about the PR.\n\n" \
">To edit any configuration parameter from 'configuration.toml', add --config_path=new_value\n" \
">For example: /review --pr_reviewer.extra_instructions=\"focus on the file: ...\" " \
">For example: /review --pr_reviewer.extra_instructions=\"focus on the file: ...\" \n" \
">To list the possible configuration parameters, use the **/config** command.\n" \
def bot_help_text(user: str):

View File

@ -7,17 +7,26 @@
# See README for details about GitHub App deployment.
[openai]
key = "<API_KEY>" # Acquire through https://platform.openai.com
org = "<ORGANIZATION>" # Optional, may be commented out.
key = "" # Acquire through https://platform.openai.com
#org = "<ORGANIZATION>" # Optional, may be commented out.
# Uncomment the following for Azure OpenAI
#api_type = "azure"
#api_version = '2023-05-15' # Check Azure documentation for the current API version
#api_base = "<API_BASE>" # The base URL for your Azure OpenAI resource. e.g. "https://<your resource name>.openai.azure.com"
#deployment_id = "<DEPLOYMENT_ID>" # The deployment name you chose when you deployed the engine
#api_base = "" # The base URL for your Azure OpenAI resource. e.g. "https://<your resource name>.openai.azure.com"
#deployment_id = "" # The deployment name you chose when you deployed the engine
[anthropic]
key = "" # Optional, uncomment if you want to use Anthropic. Acquire through https://www.anthropic.com/
[cohere]
key = "" # Optional, uncomment if you want to use Cohere. Acquire through https://dashboard.cohere.ai/
[replicate]
key = "" # Optional, uncomment if you want to use Replicate. Acquire through https://replicate.com/
[github]
# ---- Set the following only for deployment type == "user"
user_token = "<TOKEN>" # A GitHub personal access token with 'repo' scope.
user_token = "" # A GitHub personal access token with 'repo' scope.
deployment_type = "user" #set to user by default
# ---- Set the following only for deployment type == "app", see README for details.
private_key = """\

View File

@ -6,14 +6,16 @@ publish_output=true
publish_output_progress=true
verbosity_level=0 # 0,1,2
use_extra_bad_extensions=false
use_repo_settings_file=true
ai_timeout=180
[pr_reviewer] # /review #
require_focused_review=true
require_score_review=false
require_tests_review=true
require_security_review=true
num_code_suggestions=0
inline_code_comments = true
num_code_suggestions=3
inline_code_comments = false
ask_and_reflect=false
extra_instructions = ""
@ -31,6 +33,8 @@ extra_instructions = ""
push_changelog_changes=false
extra_instructions = ""
[pr_config] # /config #
[github]
# The type of deployment to create. Valid values are 'app' or 'user'.
deployment_type = "user"

View File

@ -73,6 +73,11 @@ Description: '{{description}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
{%- if commit_messages_str %}
Commit messages:
{{commit_messages_str}}
{%- endif %}
The PR Diff:

View File

@ -21,6 +21,11 @@ Description: '{{description}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
{%- if commit_messages_str %}
Commit messages:
{{commit_messages_str}}
{%- endif %}
The PR Git Diff:

View File

@ -13,6 +13,11 @@ Description: '{{description}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
{%- if commit_messages_str %}
Commit messages:
{{commit_messages_str}}
{%- endif %}
The PR Git Diff:

View File

@ -1,9 +1,9 @@
[pr_review_prompt]
system="""You are CodiumAI-PR-Reviewer, a language model designed to review git pull requests.
Your task is to provide constructive and concise feedback for the PR, and also provide meaningfull code suggestions to improve the new PR code (the '+' lines).
- Provide up to {{ num_code_suggestions }} code suggestions.
{%- if num_code_suggestions > 0 %}
- Try to focus on important suggestions like fixing code problems, issues and bugs. As a second priority, provide suggestions for meaningfull code improvements, like performance, vulnerability, modularity, and best practices.
- Provide up to {{ num_code_suggestions }} code suggestions.
- Try to focus on the most important suggestions, like fixing code problems, issues and bugs. As a second priority, provide suggestions for meaningfull code improvements, like performance, vulnerability, modularity, and best practices.
- Suggestions should focus on improving the new added code lines.
- Make sure not to provide suggestions repeating modifications already implemented in the new PR code (the '+' lines).
{%- endif %}
@ -24,7 +24,7 @@ You must use the following JSON schema to format your answer:
},
"Type of PR": {
"type": "string",
"enum": ["Bug fix", "Tests", "Bug fix with tests", "Refactoring", "Enhancement", "Documentation", "Other"]
"enum": ["Bug fix", "Tests", "Refactoring", "Enhancement", "Documentation", "Other"]
},
{%- if require_score %}
"Score": {
@ -47,17 +47,17 @@ You must use the following JSON schema to format your answer:
{%- if require_focused %}
"Focused PR": {
"type": "string",
"description": "Is this a focused PR, in the sense that it has a clear and coherent title and description, and all PR code diff changes are properly derived from the title and description? Explain your response."
"description": "Is this a focused PR, in the sense that all the PR code diff changes are united under a single focused theme ? If the theme is too broad, or the PR code diff changes are too scattered, then the PR is not focused. Explain your answer shortly."
}
},
{%- endif %}
"PR Feedback": {
"General PR suggestions": {
"General suggestions": {
"type": "string",
"description": "General suggestions and feedback for the contributors and maintainers of this PR. May include important suggestions for the overall structure, primary purpose, best practices, critical bugs, and other aspects of the PR. Explain your suggestions."
"description": "General suggestions and feedback for the contributors and maintainers of this PR. May include important suggestions for the overall structure, primary purpose, best practices, critical bugs, and other aspects of the PR. Don't address PR title and description, or lack of tests. Explain your suggestions."
},
{%- if num_code_suggestions > 0 %}
"Code suggestions": {
"Code feedback": {
"type": "array",
"maxItems": {{ num_code_suggestions }},
"uniqueItems": true,
@ -66,13 +66,13 @@ You must use the following JSON schema to format your answer:
"type": "string",
"description": "the relevant file full path"
},
"suggestion content": {
"suggestion": {
"type": "string",
"description": "a concrete suggestion for meaningfully improving the new PR code. Also describe how, specifically, the suggestion can be applied to new PR code. Add tags with importance measure that matches each suggestion ('important' or 'medium'). Do not make suggestions for updating or adding docstrings, renaming PR title and description, or linter like.
},
"relevant line in file": {
"relevant line": {
"type": "string",
"description": "an authentic single code line from the PR git diff section, to which the suggestion applies."
"description": "a single code line taken from the relevant file, to which the suggestion applies. The line should be a '+' line. Make sure to output the line exactly as it appears in the relevant file"
}
}
},
@ -80,8 +80,8 @@ You must use the following JSON schema to format your answer:
{%- if require_security %}
"Security concerns": {
"type": "string",
"description": "yes\\no question: does this PR code introduce possible security concerns or issues, like SQL injection, XSS, CSRF, and others ? explain your answer"
? explain your answer"
"description": "yes\\no question: does this PR code introduce possible security concerns or issues, like SQL injection, XSS, CSRF, and others ? If answered 'yes', explain your answer shortly"
? explain your answer shortly"
}
{%- endif %}
}
@ -109,11 +109,11 @@ Example output:
{
"General PR suggestions": "..., `xxx`...",
{%- if num_code_suggestions > 0 %}
"Code suggestions": [
"Code feedback": [
{
"relevant file": "directory/xxx.py",
"suggestion content": "xxx [important]",
"relevant line in file": "xxx",
"suggestion": "xxx [important]",
"relevant line": "xxx",
},
...
]
@ -135,6 +135,11 @@ Description: '{{description}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
{%- if commit_messages_str %}
Commit messages:
{{commit_messages_str}}
{%- endif %}
{%- if question_str %}
######

View File

@ -19,6 +19,11 @@ Description: '{{description}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
{%- if commit_messages_str %}
Commit messages:
{{commit_messages_str}}
{%- endif %}
The PR Diff:

View File

@ -34,6 +34,7 @@ class PRCodeSuggestions:
"diff": "", # empty diff for initial calculation
"num_code_suggestions": get_settings().pr_code_suggestions.num_code_suggestions,
"extra_instructions": get_settings().pr_code_suggestions.extra_instructions,
"commit_messages_str": self.git_provider.get_commit_messages(),
}
self.token_handler = TokenHandler(self.git_provider.pr,
self.vars,
@ -57,12 +58,12 @@ class PRCodeSuggestions:
async def _prepare_prediction(self, model: str):
logging.info('Getting PR diff...')
# we are using extended hunk with line numbers for code suggestions
self.patches_diff = get_pr_diff(self.git_provider,
self.token_handler,
model,
add_line_numbers_to_hunks=True,
disable_extra_lines=True)
logging.info('Getting AI prediction...')
self.prediction = await self._get_prediction(model)

View File

@ -0,0 +1,48 @@
import logging
from pr_agent.config_loader import get_settings
from pr_agent.git_providers import get_git_provider
class PRConfig:
"""
The PRConfig class is responsible for listing all configuration options available for the user.
"""
def __init__(self, pr_url: str, args=None):
"""
Initialize the PRConfig object with the necessary attributes and objects to comment on a pull request.
Args:
pr_url (str): The URL of the pull request to be reviewed.
args (list, optional): List of arguments passed to the PRReviewer class. Defaults to None.
"""
self.git_provider = get_git_provider()(pr_url)
async def run(self):
logging.info('Getting configuration settings...')
logging.info('Preparing configs...')
pr_comment = self._prepare_pr_configs()
if get_settings().config.publish_output:
logging.info('Pushing configs...')
self.git_provider.publish_comment(pr_comment)
self.git_provider.remove_initial_comment()
return ""
def _prepare_pr_configs(self) -> str:
import tomli
with open(get_settings().find_file("configuration.toml"), "rb") as conf_file:
configuration_headers = [header.lower() for header in tomli.load(conf_file).keys()]
relevant_configs = {
header: configs for header, configs in get_settings().to_dict().items()
if header.lower().startswith("pr_") and header.lower() in configuration_headers
}
comment_str = "Possible Configurations:"
for header, configs in relevant_configs.items():
if configs:
comment_str += "\n"
for key, value in configs.items():
comment_str += f"\n{header.lower()}.{key.lower()} = {repr(value) if isinstance(value, str) else value}"
comment_str += " "
if get_settings().config.verbosity_level >= 2:
logging.info(f"comment_str:\n{comment_str}")
return comment_str

View File

@ -27,7 +27,6 @@ class PRDescription:
self.main_pr_language = get_main_pr_language(
self.git_provider.get_languages(), self.git_provider.get_files()
)
commit_messages_str = self.git_provider.get_commit_messages()
# Initialize the AI handler
self.ai_handler = AiHandler()
@ -40,7 +39,7 @@ class PRDescription:
"language": self.main_pr_language,
"diff": "", # empty diff for initial calculation
"extra_instructions": get_settings().pr_description.extra_instructions,
"commit_messages_str": commit_messages_str
"commit_messages_str": self.git_provider.get_commit_messages()
}
# Initialize the token handler

View File

@ -24,6 +24,7 @@ class PRInformationFromUser:
"description": self.git_provider.get_pr_description(),
"language": self.main_pr_language,
"diff": "", # empty diff for initial calculation
"commit_messages_str": self.git_provider.get_commit_messages(),
}
self.token_handler = TokenHandler(self.git_provider.pr,
self.vars,

View File

@ -27,6 +27,7 @@ class PRQuestions:
"language": self.main_pr_language,
"diff": "", # empty diff for initial calculation
"questions": self.question_str,
"commit_messages_str": self.git_provider.get_commit_messages(),
}
self.token_handler = TokenHandler(self.git_provider.pr,
self.vars,

View File

@ -7,7 +7,8 @@ from typing import List, Tuple
from jinja2 import Environment, StrictUndefined
from pr_agent.algo.ai_handler import AiHandler
from pr_agent.algo.pr_processing import get_pr_diff, retry_with_fallback_models
from pr_agent.algo.pr_processing import get_pr_diff, retry_with_fallback_models, \
find_line_number_of_relevant_line_in_file
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.algo.utils import convert_to_markdown, try_fix_json
from pr_agent.config_loader import get_settings
@ -59,6 +60,7 @@ class PRReviewer:
'question_str': question_str,
'answer_str': answer_str,
"extra_instructions": get_settings().pr_reviewer.extra_instructions,
"commit_messages_str": self.git_provider.get_commit_messages(),
}
self.token_handler = TokenHandler(
@ -166,20 +168,31 @@ class PRReviewer:
data = try_fix_json(review)
# Move 'Security concerns' key to 'PR Analysis' section for better display
if 'PR Feedback' in data and 'Security concerns' in data['PR Feedback']:
val = data['PR Feedback']['Security concerns']
del data['PR Feedback']['Security concerns']
data['PR Analysis']['Security concerns'] = val
pr_feedback = data.get('PR Feedback', {})
security_concerns = pr_feedback.get('Security concerns')
if security_concerns:
del pr_feedback['Security concerns']
data.setdefault('PR Analysis', {})['Security concerns'] = security_concerns
# Filter out code suggestions that can be submitted as inline comments
if get_settings().config.git_provider != 'bitbucket' and get_settings().pr_reviewer.inline_code_comments \
and 'Code suggestions' in data['PR Feedback']:
data['PR Feedback']['Code suggestions'] = [
d for d in data['PR Feedback']['Code suggestions']
if any(key not in d for key in ('relevant file', 'relevant line in file', 'suggestion content'))
]
if not data['PR Feedback']['Code suggestions']:
del data['PR Feedback']['Code suggestions']
#
if 'Code feedback' in pr_feedback:
code_feedback = pr_feedback['Code feedback']
# Filter out code suggestions that can be submitted as inline comments
if get_settings().pr_reviewer.inline_code_comments:
del pr_feedback['Code feedback']
else:
for suggestion in code_feedback:
relevant_line_str = suggestion['relevant line'].split('\n')[0]
# removing '+'
suggestion['relevant line'] = relevant_line_str.lstrip('+').strip()
# try to add line numbers link to code suggestions
if hasattr(self.git_provider, 'generate_link_to_relevant_line_number'):
link = self.git_provider.generate_link_to_relevant_line_number(suggestion)
if link:
suggestion['relevant line'] = f"[{suggestion['relevant line']}]({link})"
# Add incremental review section
if self.incremental.is_incremental:
@ -205,6 +218,9 @@ class PRReviewer:
if get_settings().config.verbosity_level >= 2:
logging.info(f"Markdown response:\n{markdown_text}")
if markdown_text == None or len(markdown_text) == 0:
markdown_text = review
return markdown_text
def _publish_inline_code_comments(self) -> None:
@ -221,10 +237,10 @@ class PRReviewer:
data = try_fix_json(review)
comments: List[str] = []
for suggestion in data.get('PR Feedback', {}).get('Code suggestions', []):
for suggestion in data.get('PR Feedback', {}).get('Code feedback', []):
relevant_file = suggestion.get('relevant file', '').strip()
relevant_line_in_file = suggestion.get('relevant line in file', '').strip()
content = suggestion.get('suggestion content', '')
relevant_line_in_file = suggestion.get('relevant line', '').strip()
content = suggestion.get('suggestion', '')
if not relevant_file or not relevant_line_in_file or not content:
logging.info("Skipping inline comment with missing file/line/content")
continue

View File

@ -38,6 +38,7 @@ class PRUpdateChangelog:
"changelog_file_str": self.changelog_file_str,
"today": date.today(),
"extra_instructions": get_settings().pr_update_changelog.extra_instructions,
"commit_messages_str": self.git_provider.get_commit_messages(),
}
self.token_handler = TokenHandler(self.git_provider.pr,
self.vars,

View File

@ -41,7 +41,8 @@ dependencies = [
"aiohttp~=3.8.4",
"atlassian-python-api==3.39.0",
"GitPython~=3.1.32",
"starlette-context==0.3.6"
"starlette-context==0.3.6",
"litellm~=0.1.351"
]
[project.urls]

View File

@ -1 +1,14 @@
-e .
dynaconf==3.1.12
fastapi==0.99.0
PyGithub==1.59.*
retry==0.9.2
openai==0.27.8
Jinja2==3.1.2
tiktoken==0.4.0
uvicorn==0.22.0
python-gitlab==3.15.0
pytest~=7.4.0
aiohttp~=3.8.4
atlassian-python-api==3.39.0
GitPython~=3.1.32
litellm~=0.1.351

View File

@ -51,7 +51,7 @@ class TestConvertToMarkdown:
'Unrelated changes': 'n/a', # won't be included in the output
'Focused PR': 'Yes',
'General PR suggestions': 'general suggestion...',
'Code suggestions': [
'Code feedback': [
{
'Code example': {
'Before': 'Code before',
@ -73,7 +73,7 @@ class TestConvertToMarkdown:
- ✨ **Focused PR:** Yes
- 💡 **General PR suggestions:** general suggestion...
- 🤖 **Code suggestions:**
- 🤖 **Code feedback:**
- **Code example:**
- **Before:**

View File

@ -0,0 +1,68 @@
# Generated by CodiumAI
from pr_agent.git_providers.git_provider import FilePatchInfo
from pr_agent.algo.pr_processing import find_line_number_of_relevant_line_in_file
import pytest
class TestFindLineNumberOfRelevantLineInFile:
# Tests that the function returns the correct line number and absolute position when the relevant line is found in the patch
def test_relevant_line_found_in_patch(self):
diff_files = [
FilePatchInfo(base_file='file1', head_file='file1', patch='@@ -1,1 +1,2 @@\n-line1\n+line2\n+relevant_line\n', filename='file1')
]
relevant_file = 'file1'
relevant_line_in_file = 'relevant_line'
expected = (3, 2) # (position in patch, absolute_position in new file)
assert find_line_number_of_relevant_line_in_file(diff_files, relevant_file, relevant_line_in_file) == expected
# Tests that the function returns the correct line number and absolute position when a similar line is found using difflib
def test_similar_line_found_using_difflib(self):
diff_files = [
FilePatchInfo(base_file='file1', head_file='file1', patch='@@ -1,1 +1,2 @@\n-line1\n+relevant_line in file similar match\n', filename='file1')
]
relevant_file = 'file1'
relevant_line_in_file = '+relevant_line in file similar match ' # note the space at the end. This is to simulate a similar line found using difflib
expected = (2, 1)
assert find_line_number_of_relevant_line_in_file(diff_files, relevant_file, relevant_line_in_file) == expected
# Tests that the function returns (-1, -1) when the relevant line is not found in the patch and no similar line is found using difflib
def test_relevant_line_not_found(self):
diff_files = [
FilePatchInfo(base_file='file1', head_file='file1', patch='@@ -1,1 +1,2 @@\n-line1\n+relevant_line\n', filename='file1')
]
relevant_file = 'file1'
relevant_line_in_file = 'not_found'
expected = (-1, -1)
assert find_line_number_of_relevant_line_in_file(diff_files, relevant_file, relevant_line_in_file) == expected
# Tests that the function returns (-1, -1) when the relevant file is not found in any of the patches
def test_relevant_file_not_found(self):
diff_files = [
FilePatchInfo(base_file='file1', head_file='file1', patch='@@ -1,1 +1,2 @@\n-line1\n+relevant_line\n', filename='file2')
]
relevant_file = 'file1'
relevant_line_in_file = 'relevant_line'
expected = (-1, -1)
assert find_line_number_of_relevant_line_in_file(diff_files, relevant_file, relevant_line_in_file) == expected
# Tests that the function returns (-1, -1) when the relevant_line_in_file is an empty string
def test_empty_relevant_line(self):
diff_files = [
FilePatchInfo(base_file='file1', head_file='file1', patch='@@ -1,1 +1,2 @@\n-line1\n+relevant_line\n', filename='file1')
]
relevant_file = 'file1'
relevant_line_in_file = ''
expected = (0, 0)
assert find_line_number_of_relevant_line_in_file(diff_files, relevant_file, relevant_line_in_file) == expected
# Tests that the function returns (-1, -1) when the relevant_line_in_file is found in the patch but it is a deleted line
def test_relevant_line_found_but_deleted(self):
diff_files = [
FilePatchInfo(base_file='file1', head_file='file1', patch='@@ -1,2 +1,1 @@\n-line1\n-relevant_line\n', filename='file1')
]
relevant_file = 'file1'
relevant_line_in_file = 'relevant_line'
expected = (-1, -1)
assert find_line_number_of_relevant_line_in_file(diff_files, relevant_file, relevant_line_in_file) == expected