Merge branch 'main' into add-implement-documentation-

This commit is contained in:
ofir-frd
2025-01-02 10:52:31 +02:00
16 changed files with 404 additions and 239 deletions

View File

@ -41,6 +41,10 @@ Qode Merge PR-Agent aims to help efficiently review and handle pull requests, by
## News and Updates ## News and Updates
### December 30, 2024
Following [feedback](https://research.kudelskisecurity.com/2024/08/29/careful-where-you-code-multiple-vulnerabilities-in-ai-powered-pr-agent/) from the community, we have addressed two vulnerabilities identified in the open-source PR-Agent project. The fixes are now included in the newly released version (v0.26), available as of today.
### December 25, 2024 ### December 25, 2024
The `review` tool previously included a legacy feature for providing code suggestions (controlled by '--pr_reviewer.num_code_suggestion'). This functionality has been deprecated. Use instead the [`improve`](https://qodo-merge-docs.qodo.ai/tools/improve/) tool, which offers higher quality and more actionable code suggestions. The `review` tool previously included a legacy feature for providing code suggestions (controlled by '--pr_reviewer.num_code_suggestion'). This functionality has been deprecated. Use instead the [`improve`](https://qodo-merge-docs.qodo.ai/tools/improve/) tool, which offers higher quality and more actionable code suggestions.

View File

@ -1,6 +1,6 @@
# FAQ # FAQ
??? note "Question: Can Qodo Merge serve as a substitute for a human reviewer?" ??? note "Q: Can Qodo Merge serve as a substitute for a human reviewer?"
#### Answer:<span style="display:none;">1</span> #### Answer:<span style="display:none;">1</span>
Qodo Merge is designed to assist, not replace, human reviewers. Qodo Merge is designed to assist, not replace, human reviewers.
@ -12,7 +12,7 @@
1. Preserves user's original PR header 1. Preserves user's original PR header
2. Places user's description above the AI-generated PR description 2. Places user's description above the AI-generated PR description
3. Cannot approve PRs; approval remains reviewer's responsibility 3. Won't approve PRs; approval remains reviewer's responsibility
4. The code suggestions are optional, and aim to: 4. The code suggestions are optional, and aim to:
- Encourage self-review and self-reflection - Encourage self-review and self-reflection
- Highlight potential bugs or oversights - Highlight potential bugs or oversights
@ -22,15 +22,15 @@
___ ___
??? note "Question: I received an incorrect or irrelevant suggestion. Why?" ??? note "Q: I received an incorrect or irrelevant suggestion. Why?"
#### Answer:<span style="display:none;">2</span> #### Answer:<span style="display:none;">2</span>
- Modern AI models, like Claude 3.5 Sonnet and GPT-4, are improving rapidly but remain imperfect. Users should critically evaluate all suggestions rather than accepting them automatically. - Modern AI models, like Claude 3.5 Sonnet and GPT-4, are improving rapidly but remain imperfect. Users should critically evaluate all suggestions rather than accepting them automatically.
- AI errors are rare, but possible. A main value from reviewing the code suggestions lies in their high probability of catching **mistakes or bugs made by the PR author**. We believe it's worth spending 30-60 seconds reviewing suggestions, even if some aren't relevant, as this practice can enhances code quality and prevent bugs in production. - AI errors are rare, but possible. A main value from reviewing the code suggestions lies in their high probability of catching **mistakes or bugs made by the PR author**. We believe it's worth spending 30-60 seconds reviewing suggestions, even if some aren't relevant, as this practice can enhance code quality and prevent bugs in production.
- The hierarchical structure of the suggestions is designed to help the user to _quickly_ understand them, and to decide which ones are relevant and which are not: - The hierarchical structure of the suggestions is designed to help the user _quickly_ understand them, and to decide which ones are relevant and which are not:
- Only if the `Category` header is relevant, the user should move to the summarized suggestion description. - Only if the `Category` header is relevant, the user should move to the summarized suggestion description.
- Only if the summarized suggestion description is relevant, the user should click on the collapsible, to read the full suggestion description with a code preview example. - Only if the summarized suggestion description is relevant, the user should click on the collapsible, to read the full suggestion description with a code preview example.
@ -40,14 +40,14 @@ ___
___ ___
??? note "Question: How can I get more tailored suggestions?" ??? note "Q: How can I get more tailored suggestions?"
#### Answer:<span style="display:none;">3</span> #### Answer:<span style="display:none;">3</span>
See [here](https://qodo-merge-docs.qodo.ai/tools/improve/#extra-instructions-and-best-practices) for more information on how to use the `extra_instructions` and `best_practices` configuration options, to guide the model to more tailored suggestions. See [here](https://qodo-merge-docs.qodo.ai/tools/improve/#extra-instructions-and-best-practices) for more information on how to use the `extra_instructions` and `best_practices` configuration options, to guide the model to more tailored suggestions.
___ ___
??? note "Question: Will you store my code ? Are you using my code to train models?" ??? note "Q: Will you store my code? Are you using my code to train models?"
#### Answer:<span style="display:none;">4</span> #### Answer:<span style="display:none;">4</span>
No. Qodo Merge strict privacy policy ensures that your code is not stored or used for training purposes. No. Qodo Merge strict privacy policy ensures that your code is not stored or used for training purposes.
@ -56,12 +56,35 @@ ___
___ ___
??? note "Question: Can I use my own LLM keys with Qodo Merge?" ??? note "Q: Can I use my own LLM keys with Qodo Merge?"
#### Answer:<span style="display:none;">5</span> #### Answer:<span style="display:none;">5</span>
When you self-host, you use your own keys. When you self-host the [open-source](https://github.com/Codium-ai/pr-agent) version, you use your own keys.
Qodo Merge Pro with SaaS deployment is a hosted version of Qodo Merge, where Qodo manages the infrastructure and the keys. Qodo Merge Pro with SaaS deployment is a hosted version of Qodo Merge, where Qodo manages the infrastructure and the keys.
For enterprise customers, on-prem deployment is also available. [Contact us](https://www.codium.ai/contact/#pricing) for more information. For enterprise customers, on-prem deployment is also available. [Contact us](https://www.codium.ai/contact/#pricing) for more information.
___
??? note "Q: Can Qodo Merge review draft/offline PRs?"
#### Answer:<span style="display:none;">5</span>
Yes. While Qodo Merge won't automatically review draft PRs, you can still get feedback by manually requesting it through [online commenting](https://qodo-merge-docs.qodo.ai/usage-guide/automations_and_usage/#online-usage).
For active PRs, you can customize the automatic feedback settings [here](https://qodo-merge-docs.qodo.ai/usage-guide/automations_and_usage/#qodo-merge-automatic-feedback) to match your team's workflow.
___
??? note "Q: Can the 'Review effort' feedback be calibrated or customized?"
#### Answer:<span style="display:none;">5</span>
Yes, you can customize review effort estimates using the `extra_instructions` configuration option (see [documentation](https://qodo-merge-docs.qodo.ai/tools/review/#configuration-options)).
Example mapping:
- Effort 1: < 30 minutes review time
- Effort 2: 30-60 minutes review time
- Effort 3: 60-90 minutes review time
- ...
Note: The effort levels (1-5) are primarily meant for _comparative_ purposes, helping teams prioritize reviewing smaller PRs first. The actual review duration may vary, as the focus is on providing consistent relative effort estimates.
___ ___

View File

@ -46,7 +46,6 @@ commands = list(command2class.keys())
class PRAgent: class PRAgent:
def __init__(self, ai_handler: partial[BaseAiHandler,] = LiteLLMAIHandler): def __init__(self, ai_handler: partial[BaseAiHandler,] = LiteLLMAIHandler):
self.ai_handler = ai_handler # will be initialized in run_action self.ai_handler = ai_handler # will be initialized in run_action
self.forbidden_cli_args = ['enable_auto_approval']
async def handle_request(self, pr_url, request, notify=None) -> bool: async def handle_request(self, pr_url, request, notify=None) -> bool:
# First, apply repo specific settings if exists # First, apply repo specific settings if exists
@ -61,14 +60,23 @@ class PRAgent:
else: else:
action, *args = request action, *args = request
forbidden_cli_args = ['enable_auto_approval', 'base_url', 'url', 'app_name', 'secret_provider',
'git_provider', 'skip_keys', 'key', 'ANALYTICS_FOLDER', 'uri', 'app_id', 'webhook_secret',
'bearer_token', 'PERSONAL_ACCESS_TOKEN', 'override_deployment_type', 'private_key', 'api_base', 'api_type', 'api_version']
if args: if args:
for forbidden_arg in self.forbidden_cli_args: for arg in args:
for arg in args: if arg.startswith('--'):
if forbidden_arg in arg: arg_word = arg.lower()
get_logger().error( arg_word = arg_word.replace('__', '.') # replace double underscore with dot, e.g. --openai__key -> --openai.key
f"CLI argument for param '{forbidden_arg}' is forbidden. Use instead a configuration file." for forbidden_arg in forbidden_cli_args:
) forbidden_arg_word = forbidden_arg.lower()
return False if '.' not in forbidden_arg_word:
forbidden_arg_word = '.' + forbidden_arg_word
if forbidden_arg_word in arg_word:
get_logger().error(
f"CLI argument for param '{forbidden_arg}' is forbidden. Use instead a configuration file."
)
return False
args = update_settings_from_args(args) args = update_settings_from_args(args)
action = action.lstrip("/").lower() action = action.lstrip("/").lower()

View File

@ -66,10 +66,10 @@ MAX_TOKENS = {
'claude-3-5-sonnet': 100000, 'claude-3-5-sonnet': 100000,
'groq/llama3-8b-8192': 8192, 'groq/llama3-8b-8192': 8192,
'groq/llama3-70b-8192': 8192, 'groq/llama3-70b-8192': 8192,
'groq/llama-3.1-8b-instant': 8192,
'groq/llama-3.3-70b-versatile': 128000,
'groq/mixtral-8x7b-32768': 32768, 'groq/mixtral-8x7b-32768': 32768,
'groq/llama-3.1-8b-instant': 131072, 'groq/gemma2-9b-it': 8192,
'groq/llama-3.1-70b-versatile': 131072,
'groq/llama-3.1-405b-reasoning': 131072,
'ollama/llama3': 4096, 'ollama/llama3': 4096,
'watsonx/meta-llama/llama-3-8b-instruct': 4096, 'watsonx/meta-llama/llama-3-8b-instruct': 4096,
"watsonx/meta-llama/llama-3-70b-instruct": 4096, "watsonx/meta-llama/llama-3-70b-instruct": 4096,

View File

@ -364,48 +364,51 @@ __old hunk__
def extract_hunk_lines_from_patch(patch: str, file_name, line_start, line_end, side) -> tuple[str, str]: def extract_hunk_lines_from_patch(patch: str, file_name, line_start, line_end, side) -> tuple[str, str]:
try:
patch_with_lines_str = f"\n\n## File: '{file_name.strip()}'\n\n"
selected_lines = ""
patch_lines = patch.splitlines()
RE_HUNK_HEADER = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
match = None
start1, size1, start2, size2 = -1, -1, -1, -1
skip_hunk = False
selected_lines_num = 0
for line in patch_lines:
if 'no newline at end of file' in line.lower():
continue
patch_with_lines_str = f"\n\n## File: '{file_name.strip()}'\n\n" if line.startswith('@@'):
selected_lines = "" skip_hunk = False
patch_lines = patch.splitlines() selected_lines_num = 0
RE_HUNK_HEADER = re.compile( header_line = line
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
match = None
start1, size1, start2, size2 = -1, -1, -1, -1
skip_hunk = False
selected_lines_num = 0
for line in patch_lines:
if 'no newline at end of file' in line.lower():
continue
if line.startswith('@@'): match = RE_HUNK_HEADER.match(line)
skip_hunk = False
selected_lines_num = 0
header_line = line
match = RE_HUNK_HEADER.match(line) section_header, size1, size2, start1, start2 = extract_hunk_headers(match)
section_header, size1, size2, start1, start2 = extract_hunk_headers(match)
# check if line range is in this hunk
if side.lower() == 'left':
# check if line range is in this hunk # check if line range is in this hunk
if not (start1 <= line_start <= start1 + size1): if side.lower() == 'left':
skip_hunk = True # check if line range is in this hunk
continue if not (start1 <= line_start <= start1 + size1):
elif side.lower() == 'right': skip_hunk = True
if not (start2 <= line_start <= start2 + size2): continue
skip_hunk = True elif side.lower() == 'right':
continue if not (start2 <= line_start <= start2 + size2):
patch_with_lines_str += f'\n{header_line}\n' skip_hunk = True
continue
patch_with_lines_str += f'\n{header_line}\n'
elif not skip_hunk: elif not skip_hunk:
if side.lower() == 'right' and line_start <= start2 + selected_lines_num <= line_end: if side.lower() == 'right' and line_start <= start2 + selected_lines_num <= line_end:
selected_lines += line + '\n' selected_lines += line + '\n'
if side.lower() == 'left' and start1 <= selected_lines_num + start1 <= line_end: if side.lower() == 'left' and start1 <= selected_lines_num + start1 <= line_end:
selected_lines += line + '\n' selected_lines += line + '\n'
patch_with_lines_str += line + '\n' patch_with_lines_str += line + '\n'
if not line.startswith('-'): # currently we don't support /ask line for deleted lines if not line.startswith('-'): # currently we don't support /ask line for deleted lines
selected_lines_num += 1 selected_lines_num += 1
except Exception as e:
get_logger().error(f"Failed to extract hunk lines from patch: {e}", artifact={"traceback": traceback.format_exc()})
return "", ""
return patch_with_lines_str.rstrip(), selected_lines.rstrip() return patch_with_lines_str.rstrip(), selected_lines.rstrip()

View File

@ -205,10 +205,11 @@ def pr_generate_extended_diff(pr_languages: list,
if not extended_patch: if not extended_patch:
get_logger().warning(f"Failed to extend patch for file: {file.filename}") get_logger().warning(f"Failed to extend patch for file: {file.filename}")
continue continue
full_extended_patch = f"\n\n## {file.filename}\n{extended_patch.rstrip()}\n"
if add_line_numbers_to_hunks: if add_line_numbers_to_hunks:
full_extended_patch = convert_to_hunks_with_lines_numbers(extended_patch, file) full_extended_patch = convert_to_hunks_with_lines_numbers(extended_patch, file)
else:
full_extended_patch = f"\n\n## File: '{file.filename.strip()}'\n{extended_patch.rstrip()}\n"
# add AI-summary metadata to the patch # add AI-summary metadata to the patch
if file.ai_file_summary and get_settings().get("config.enable_ai_metadata", False): if file.ai_file_summary and get_settings().get("config.enable_ai_metadata", False):
@ -316,13 +317,13 @@ def generate_full_patch(convert_hunks_to_line_numbers, file_dict, max_tokens_mod
# TODO: Option for alternative logic to remove hunks from the patch to reduce the number of tokens # TODO: Option for alternative logic to remove hunks from the patch to reduce the number of tokens
# until we meet the requirements # until we meet the requirements
if get_settings().config.verbosity_level >= 2: if get_settings().config.verbosity_level >= 2:
get_logger().warning(f"Patch too large, skipping it, {filename}") get_logger().warning(f"Patch too large, skipping it: '{filename}'")
remaining_files_list_new.append(filename) remaining_files_list_new.append(filename)
continue continue
if patch: if patch:
if not convert_hunks_to_line_numbers: if not convert_hunks_to_line_numbers:
patch_final = f"\n\n## File: '{filename.strip()}\n\n{patch.strip()}\n'" patch_final = f"\n\n## File: '{filename.strip()}'\n\n{patch.strip()}\n"
else: else:
patch_final = "\n\n" + patch.strip() patch_final = "\n\n" + patch.strip()
patches.append(patch_final) patches.append(patch_final)

View File

@ -23,6 +23,7 @@ from pydantic import BaseModel
from starlette_context import context from starlette_context import context
from pr_agent.algo import MAX_TOKENS from pr_agent.algo import MAX_TOKENS
from pr_agent.algo.git_patch_processing import extract_hunk_lines_from_patch
from pr_agent.algo.token_handler import TokenEncoder from pr_agent.algo.token_handler import TokenEncoder
from pr_agent.algo.types import FilePatchInfo from pr_agent.algo.types import FilePatchInfo
from pr_agent.config_loader import get_settings, global_settings from pr_agent.config_loader import get_settings, global_settings
@ -272,7 +273,11 @@ def convert_to_markdown_v2(output_data: dict,
return markdown_text return markdown_text
def extract_relevant_lines_str(end_line, files, relevant_file, start_line, dedent=False):
def extract_relevant_lines_str(end_line, files, relevant_file, start_line, dedent=False) -> str:
"""
Finds 'relevant_file' in 'files', and extracts the lines from 'start_line' to 'end_line' string from the file content.
"""
try: try:
relevant_lines_str = "" relevant_lines_str = ""
if files: if files:
@ -280,10 +285,23 @@ def extract_relevant_lines_str(end_line, files, relevant_file, start_line, deden
for file in files: for file in files:
if file.filename.strip() == relevant_file: if file.filename.strip() == relevant_file:
if not file.head_file: if not file.head_file:
get_logger().warning(f"No content found in file: {file.filename}") # as a fallback, extract relevant lines directly from patch
return "" patch = file.patch
relevant_file_lines = file.head_file.splitlines() get_logger().info(f"No content found in file: '{file.filename}' for 'extract_relevant_lines_str'. Using patch instead")
relevant_lines_str = "\n".join(relevant_file_lines[start_line - 1:end_line]) _, selected_lines = extract_hunk_lines_from_patch(patch, file.filename, start_line, end_line,side='right')
if not selected_lines:
get_logger().error(f"Failed to extract relevant lines from patch: {file.filename}")
return ""
# filter out '-' lines
relevant_lines_str = ""
for line in selected_lines.splitlines():
if line.startswith('-'):
continue
relevant_lines_str += line[1:] + '\n'
else:
relevant_file_lines = file.head_file.splitlines()
relevant_lines_str = "\n".join(relevant_file_lines[start_line - 1:end_line])
if dedent and relevant_lines_str: if dedent and relevant_lines_str:
# Remove the longest leading string of spaces and tabs common to all lines. # Remove the longest leading string of spaces and tabs common to all lines.
relevant_lines_str = textwrap.dedent(relevant_lines_str) relevant_lines_str = textwrap.dedent(relevant_lines_str)
@ -565,27 +583,20 @@ def load_large_diff(filename, new_file_content_str: str, original_file_content_s
""" """
Generate a patch for a modified file by comparing the original content of the file with the new content provided as Generate a patch for a modified file by comparing the original content of the file with the new content provided as
input. input.
Args:
new_file_content_str: The new content of the file as a string.
original_file_content_str: The original content of the file as a string.
Returns:
The generated or provided patch string.
Raises:
None.
""" """
patch = "" if not original_file_content_str and not new_file_content_str:
return ""
try: try:
diff = difflib.unified_diff(original_file_content_str.splitlines(keepends=True), diff = difflib.unified_diff(original_file_content_str.splitlines(keepends=True),
new_file_content_str.splitlines(keepends=True)) new_file_content_str.splitlines(keepends=True))
if get_settings().config.verbosity_level >= 2 and show_warning: if get_settings().config.verbosity_level >= 2 and show_warning:
get_logger().warning(f"File was modified, but no patch was found. Manually creating patch: {filename}.") get_logger().info(f"File was modified, but no patch was found. Manually creating patch: {filename}.")
patch = ''.join(diff) patch = ''.join(diff)
except Exception: return patch
pass except Exception as e:
return patch get_logger().exception(f"Failed to generate patch for file: {filename}")
return ""
def update_settings_from_args(args: List[str]) -> List[str]: def update_settings_from_args(args: List[str]) -> List[str]:

View File

@ -402,10 +402,21 @@ class BitbucketServerProvider(GitProvider):
try: try:
projects_index = path_parts.index("projects") projects_index = path_parts.index("projects")
except ValueError as e: except ValueError:
projects_index = -1
try:
users_index = path_parts.index("users")
except ValueError:
users_index = -1
if projects_index == -1 and users_index == -1:
raise ValueError(f"The provided URL '{pr_url}' does not appear to be a Bitbucket PR URL") raise ValueError(f"The provided URL '{pr_url}' does not appear to be a Bitbucket PR URL")
path_parts = path_parts[projects_index:] if projects_index != -1:
path_parts = path_parts[projects_index:]
else:
path_parts = path_parts[users_index:]
if len(path_parts) < 6 or path_parts[2] != "repos" or path_parts[4] != "pull-requests": if len(path_parts) < 6 or path_parts[2] != "repos" or path_parts[4] != "pull-requests":
raise ValueError( raise ValueError(
@ -413,6 +424,8 @@ class BitbucketServerProvider(GitProvider):
) )
workspace_slug = path_parts[1] workspace_slug = path_parts[1]
if users_index != -1:
workspace_slug = f"~{workspace_slug}"
repo_slug = path_parts[3] repo_slug = path_parts[3]
try: try:
pr_number = int(path_parts[5]) pr_number = int(path_parts[5])

View File

@ -174,6 +174,24 @@ class GithubProvider(GitProvider):
diff_files = [] diff_files = []
invalid_files_names = [] invalid_files_names = []
is_close_to_rate_limit = False
# The base.sha will point to the current state of the base branch (including parallel merges), not the original base commit when the PR was created
# We can fix this by finding the merge base commit between the PR head and base branches
# Note that The pr.head.sha is actually correct as is - it points to the latest commit in your PR branch.
# This SHA isn't affected by parallel merges to the base branch since it's specific to your PR's branch.
repo = self.repo_obj
pr = self.pr
try:
compare = repo.compare(pr.base.sha, pr.head.sha) # communication with GitHub
merge_base_commit = compare.merge_base_commit
except Exception as e:
get_logger().error(f"Failed to get merge base commit: {e}")
merge_base_commit = pr.base
if merge_base_commit.sha != pr.base.sha:
get_logger().info(
f"Using merge base commit {merge_base_commit.sha} instead of base commit ")
counter_valid = 0 counter_valid = 0
for file in files: for file in files:
if not is_valid_file(file.filename): if not is_valid_file(file.filename):
@ -181,48 +199,36 @@ class GithubProvider(GitProvider):
continue continue
patch = file.patch patch = file.patch
if is_close_to_rate_limit:
# allow only a limited number of files to be fully loaded. We can manage the rest with diffs only
counter_valid += 1
avoid_load = False
if counter_valid >= MAX_FILES_ALLOWED_FULL and patch and not self.incremental.is_incremental:
avoid_load = True
if counter_valid == MAX_FILES_ALLOWED_FULL:
get_logger().info(f"Too many files in PR, will avoid loading full content for rest of files")
if avoid_load:
new_file_content_str = "" new_file_content_str = ""
original_file_content_str = ""
else: else:
new_file_content_str = self._get_pr_file_content(file, self.pr.head.sha) # communication with GitHub # allow only a limited number of files to be fully loaded. We can manage the rest with diffs only
counter_valid += 1
avoid_load = False
if counter_valid >= MAX_FILES_ALLOWED_FULL and patch and not self.incremental.is_incremental:
avoid_load = True
if counter_valid == MAX_FILES_ALLOWED_FULL:
get_logger().info(f"Too many files in PR, will avoid loading full content for rest of files")
if self.incremental.is_incremental and self.unreviewed_files_set:
original_file_content_str = self._get_pr_file_content(file, self.incremental.last_seen_commit_sha)
patch = load_large_diff(file.filename, new_file_content_str, original_file_content_str)
self.unreviewed_files_set[file.filename] = patch
else:
if avoid_load: if avoid_load:
original_file_content_str = "" new_file_content_str = ""
else: else:
# The base.sha will point to the current state of the base branch (including parallel merges), not the original base commit when the PR was created new_file_content_str = self._get_pr_file_content(file, self.pr.head.sha) # communication with GitHub
# We can fix this by finding the merge base commit between the PR head and base branches
# Note that The pr.head.sha is actually correct as is - it points to the latest commit in your PR branch.
# This SHA isn't affected by parallel merges to the base branch since it's specific to your PR's branch.
repo = self.repo_obj
pr = self.pr
try:
compare = repo.compare(pr.base.sha, pr.head.sha)
merge_base_commit = compare.merge_base_commit
except Exception as e:
get_logger().error(f"Failed to get merge base commit: {e}")
merge_base_commit = pr.base
if merge_base_commit.sha != pr.base.sha:
get_logger().info(
f"Using merge base commit {merge_base_commit.sha} instead of base commit "
f"{pr.base.sha} for {file.filename}")
original_file_content_str = self._get_pr_file_content(file, merge_base_commit.sha)
if not patch: if self.incremental.is_incremental and self.unreviewed_files_set:
original_file_content_str = self._get_pr_file_content(file, self.incremental.last_seen_commit_sha)
patch = load_large_diff(file.filename, new_file_content_str, original_file_content_str) patch = load_large_diff(file.filename, new_file_content_str, original_file_content_str)
self.unreviewed_files_set[file.filename] = patch
else:
if avoid_load:
original_file_content_str = ""
else:
original_file_content_str = self._get_pr_file_content(file, merge_base_commit.sha)
# original_file_content_str = self._get_pr_file_content(file, self.pr.base.sha)
if not patch:
patch = load_large_diff(file.filename, new_file_content_str, original_file_content_str)
if file.status == 'added': if file.status == 'added':
edit_type = EDIT_TYPE.ADDED edit_type = EDIT_TYPE.ADDED
@ -237,9 +243,14 @@ class GithubProvider(GitProvider):
edit_type = EDIT_TYPE.UNKNOWN edit_type = EDIT_TYPE.UNKNOWN
# count number of lines added and removed # count number of lines added and removed
patch_lines = patch.splitlines(keepends=True) if hasattr(file, 'additions') and hasattr(file, 'deletions'):
num_plus_lines = len([line for line in patch_lines if line.startswith('+')]) num_plus_lines = file.additions
num_minus_lines = len([line for line in patch_lines if line.startswith('-')]) num_minus_lines = file.deletions
else:
patch_lines = patch.splitlines(keepends=True)
num_plus_lines = len([line for line in patch_lines if line.startswith('+')])
num_minus_lines = len([line for line in patch_lines if line.startswith('-')])
file_patch_canonical_structure = FilePatchInfo(original_file_content_str, new_file_content_str, patch, file_patch_canonical_structure = FilePatchInfo(original_file_content_str, new_file_content_str, patch,
file.filename, edit_type=edit_type, file.filename, edit_type=edit_type,
num_plus_lines=num_plus_lines, num_plus_lines=num_plus_lines,

View File

@ -88,6 +88,7 @@ publish_description_as_comment_persistent=true
## changes walkthrough section ## changes walkthrough section
enable_semantic_files_types=true enable_semantic_files_types=true
collapsible_file_list='adaptive' # true, false, 'adaptive' collapsible_file_list='adaptive' # true, false, 'adaptive'
collapsible_file_list_threshold=8
inline_file_summary=false # false, true, 'table' inline_file_summary=false # false, true, 'table'
# markers # markers
use_description_markers=false use_description_markers=false
@ -96,7 +97,6 @@ include_generated_by_header=true
enable_large_pr_handling=true enable_large_pr_handling=true
max_ai_calls=4 max_ai_calls=4
async_ai_calls=true async_ai_calls=true
mention_extra_files=true
#custom_labels = ['Bug fix', 'Tests', 'Bug fix with tests', 'Enhancement', 'Documentation', 'Other'] #custom_labels = ['Bug fix', 'Tests', 'Bug fix with tests', 'Enhancement', 'Documentation', 'Other']
[pr_questions] # /ask # [pr_questions] # /ask #

View File

@ -1,15 +1,11 @@
[pr_description_prompt] [pr_description_prompt]
system="""You are PR-Reviewer, a language model designed to review a Git Pull Request (PR). system="""You are PR-Reviewer, a language model designed to review a Git Pull Request (PR).
{%- if enable_custom_labels %} Your task is to provide a full description for the PR content - type, description, title and files walkthrough.
Your task is to provide a full description for the PR content - files walkthrough, title, type, description and labels. - Focus on the new PR code (lines starting with '+' in the 'PR Git Diff' section).
{%- else %}
Your task is to provide a full description for the PR content - files walkthrough, title, type, and description.
{%- endif %}
- Focus on the new PR code (lines starting with '+').
- Keep in mind that the 'Previous title', 'Previous description' and 'Commit messages' sections may be partial, simplistic, non-informative or out of date. Hence, compare them to the PR diff code, and use them only as a reference. - Keep in mind that the 'Previous title', 'Previous description' and 'Commit messages' sections may be partial, simplistic, non-informative or out of date. Hence, compare them to the PR diff code, and use them only as a reference.
- The generated title and description should prioritize the most significant changes. - The generated title and description should prioritize the most significant changes.
- If needed, each YAML output should be in block scalar indicator ('|-') - If needed, each YAML output should be in block scalar indicator ('|')
- When quoting variables or names from the code, use backticks (`) instead of single quote ('). - When quoting variables, names or file paths from the code, use backticks (`) instead of single quote (').
{%- if extra_instructions %} {%- if extra_instructions %}
@ -38,22 +34,20 @@ class PRType(str, Enum):
{%- if enable_semantic_files_types %} {%- if enable_semantic_files_types %}
class FileDescription(BaseModel): class FileDescription(BaseModel):
filename: str = Field(description="The full file path of the relevant file.") filename: str = Field(description="The full file path of the relevant file")
language: str = Field(description="The programming language of the relevant file.") {%- if include_file_summary_changes %}
changes_summary: str = Field(description="concise summary of the changes in the relevant file, in bullet points (1-4 bullet points).") changes_summary: str = Field(description="concise summary of the changes in the relevant file, in bullet points (1-4 bullet points).")
changes_title: str = Field(description="an informative title for the changes in the files, describing its main theme (5-10 words).") {%- endif %}
changes_title: str = Field(description="one-line summary (5-10 words) capturing the main theme of changes in the file")
label: str = Field(description="a single semantic label that represents a type of code changes that occurred in the File. Possible values (partial list): 'bug fix', 'tests', 'enhancement', 'documentation', 'error handling', 'configuration changes', 'dependencies', 'formatting', 'miscellaneous', ...") label: str = Field(description="a single semantic label that represents a type of code changes that occurred in the File. Possible values (partial list): 'bug fix', 'tests', 'enhancement', 'documentation', 'error handling', 'configuration changes', 'dependencies', 'formatting', 'miscellaneous', ...")
{%- endif %} {%- endif %}
class PRDescription(BaseModel): class PRDescription(BaseModel):
type: List[PRType] = Field(description="one or more types that describe the PR content. Return the label member value (e.g. 'Bug fix', not 'bug_fix')") type: List[PRType] = Field(description="one or more types that describe the PR content. Return the label member value (e.g. 'Bug fix', not 'bug_fix')")
description: str = Field(description="summarize the PR changes in up to four bullet points, each up to 8 words. For large PRs, add sub-bullets if needed. Order bullets by importance, with each bullet highlighting a key change group.")
title: str = Field(description="a concise and descriptive title that captures the PR's main theme")
{%- if enable_semantic_files_types %} {%- if enable_semantic_files_types %}
pr_files: List[FileDescription] = Field(max_items=15, description="a list of the files in the PR, and summary of their changes") pr_files: List[FileDescription] = Field(max_items=20, description="a list of all the files that were changed in the PR, and summary of their changes. Each file must be analyzed regardless of change size.")
{%- endif %}
description: str = Field(description="an informative and concise description of the PR. Use bullet points. Display first the most significant changes.")
title: str = Field(description="an informative title for the PR, describing its main theme")
{%- if enable_custom_labels %}
labels: List[Label] = Field(min_items=0, description="choose the relevant custom labels that describe the PR content, and return their keys. Use the value field of the Label object to better understand the label meaning.")
{%- endif %} {%- endif %}
===== =====
@ -64,30 +58,23 @@ Example output:
type: type:
- ... - ...
- ... - ...
{%- if enable_semantic_files_types %}
pr_files:
- filename: |
...
language: |
...
changes_summary: |
...
changes_title: |
...
label: |
...
...
{%- endif %}
description: | description: |
... ...
title: | title: |
... ...
{%- if enable_custom_labels %} {%- if enable_semantic_files_types %}
labels: pr_files:
- | - filename: |
... ...
- | {%- if include_file_summary_changes %}
... changes_summary: |
...
{%- endif %}
changes_title: |
...
label: |
label_key_1
...
{%- endif %} {%- endif %}
``` ```
@ -136,7 +123,7 @@ Commit messages:
{%- endif %} {%- endif %}
The PR Diff: The PR Git Diff:
===== =====
{{ diff|trim }} {{ diff|trim }}
===== =====

View File

@ -1,6 +1,7 @@
import asyncio import asyncio
import copy import copy
import re import re
import traceback
from functools import partial from functools import partial
from typing import List, Tuple from typing import List, Tuple
@ -57,6 +58,7 @@ class PRDescription:
self.ai_handler.main_pr_language = self.main_pr_language self.ai_handler.main_pr_language = self.main_pr_language
# Initialize the variables dictionary # Initialize the variables dictionary
self.COLLAPSIBLE_FILE_LIST_THRESHOLD = get_settings().pr_description.get("collapsible_file_list_threshold", 8)
self.vars = { self.vars = {
"title": self.git_provider.pr.title, "title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(), "branch": self.git_provider.get_pr_branch(),
@ -69,6 +71,7 @@ class PRDescription:
"custom_labels_class": "", # will be filled if necessary in 'set_custom_labels' function "custom_labels_class": "", # will be filled if necessary in 'set_custom_labels' function
"enable_semantic_files_types": get_settings().pr_description.enable_semantic_files_types, "enable_semantic_files_types": get_settings().pr_description.enable_semantic_files_types,
"related_tickets": "", "related_tickets": "",
"include_file_summary_changes": len(self.git_provider.get_diff_files()) <= self.COLLAPSIBLE_FILE_LIST_THRESHOLD
} }
self.user_description = self.git_provider.get_user_description() self.user_description = self.git_provider.get_user_description()
@ -85,7 +88,6 @@ class PRDescription:
self.patches_diff = None self.patches_diff = None
self.prediction = None self.prediction = None
self.file_label_dict = None self.file_label_dict = None
self.COLLAPSIBLE_FILE_LIST_THRESHOLD = 8
async def run(self): async def run(self):
try: try:
@ -114,6 +116,8 @@ class PRDescription:
pr_labels, pr_file_changes = [], [] pr_labels, pr_file_changes = [], []
if get_settings().pr_description.publish_labels: if get_settings().pr_description.publish_labels:
pr_labels = self._prepare_labels() pr_labels = self._prepare_labels()
else:
get_logger().debug(f"Publishing labels disabled")
if get_settings().pr_description.use_description_markers: if get_settings().pr_description.use_description_markers:
pr_title, pr_body, changes_walkthrough, pr_file_changes = self._prepare_pr_answer_with_markers() pr_title, pr_body, changes_walkthrough, pr_file_changes = self._prepare_pr_answer_with_markers()
@ -137,6 +141,7 @@ class PRDescription:
pr_body += show_relevant_configurations(relevant_section='pr_description') pr_body += show_relevant_configurations(relevant_section='pr_description')
if get_settings().config.publish_output: if get_settings().config.publish_output:
# publish labels # publish labels
if get_settings().pr_description.publish_labels and pr_labels and self.git_provider.is_supported("get_labels"): if get_settings().pr_description.publish_labels and pr_labels and self.git_provider.is_supported("get_labels"):
original_labels = self.git_provider.get_pr_labels(update=True) original_labels = self.git_provider.get_pr_labels(update=True)
@ -164,7 +169,7 @@ class PRDescription:
self.git_provider.publish_description(pr_title, pr_body) self.git_provider.publish_description(pr_title, pr_body)
# publish final update message # publish final update message
if (get_settings().pr_description.final_update_message): if (get_settings().pr_description.final_update_message and not get_settings().config.get('is_auto_command', False)):
latest_commit_url = self.git_provider.get_latest_commit_url() latest_commit_url = self.git_provider.get_latest_commit_url()
if latest_commit_url: if latest_commit_url:
pr_url = self.git_provider.get_pr_url() pr_url = self.git_provider.get_pr_url()
@ -176,35 +181,37 @@ class PRDescription:
get_settings().data = {"artifact": pr_body} get_settings().data = {"artifact": pr_body}
return return
except Exception as e: except Exception as e:
get_logger().error(f"Error generating PR description {self.pr_id}: {e}") get_logger().error(f"Error generating PR description {self.pr_id}: {e}",
artifact={"traceback": traceback.format_exc()})
return "" return ""
async def _prepare_prediction(self, model: str) -> None: async def _prepare_prediction(self, model: str) -> None:
if get_settings().pr_description.use_description_markers and 'pr_agent:' not in self.user_description: if get_settings().pr_description.use_description_markers and 'pr_agent:' not in self.user_description:
get_logger().info( get_logger().info("Markers were enabled, but user description does not contain markers. skipping AI prediction")
"Markers were enabled, but user description does not contain markers. skipping AI prediction")
return None return None
large_pr_handling = get_settings().pr_description.enable_large_pr_handling and "pr_description_only_files_prompts" in get_settings() large_pr_handling = get_settings().pr_description.enable_large_pr_handling and "pr_description_only_files_prompts" in get_settings()
output = get_pr_diff(self.git_provider, self.token_handler, model, large_pr_handling=large_pr_handling, output = get_pr_diff(self.git_provider, self.token_handler, model, large_pr_handling=large_pr_handling, return_remaining_files=True)
return_remaining_files=True)
if isinstance(output, tuple): if isinstance(output, tuple):
patches_diff, remaining_files_list = output patches_diff, remaining_files_list = output
else: else:
patches_diff = output patches_diff = output
remaining_files_list = [] remaining_files_list = []
if not large_pr_handling or patches_diff: if not large_pr_handling or patches_diff:
self.patches_diff = patches_diff self.patches_diff = patches_diff
if patches_diff: if patches_diff:
# generate the prediction
get_logger().debug(f"PR diff", artifact=self.patches_diff) get_logger().debug(f"PR diff", artifact=self.patches_diff)
self.prediction = await self._get_prediction(model, patches_diff, prompt="pr_description_prompt") self.prediction = await self._get_prediction(model, patches_diff, prompt="pr_description_prompt")
if (remaining_files_list and 'pr_files' in self.prediction and 'label:' in self.prediction and
get_settings().pr_description.mention_extra_files): # extend the prediction with additional files not shown
get_logger().debug(f"Extending additional files, {len(remaining_files_list)} files") if get_settings().pr_description.enable_semantic_files_types:
self.prediction = await self.extend_additional_files(remaining_files_list) self.prediction = await self.extend_uncovered_files(self.prediction)
else: else:
get_logger().error(f"Error getting PR diff {self.pr_id}") get_logger().error(f"Error getting PR diff {self.pr_id}",
artifact={"traceback": traceback.format_exc()})
self.prediction = None self.prediction = None
else: else:
# get the diff in multiple patches, with the token handler only for the files prompt # get the diff in multiple patches, with the token handler only for the files prompt
@ -289,43 +296,81 @@ class PRDescription:
prompt="pr_description_only_description_prompts") prompt="pr_description_only_description_prompts")
prediction_headers = prediction_headers.strip().removeprefix('```yaml').strip('`').strip() prediction_headers = prediction_headers.strip().removeprefix('```yaml').strip('`').strip()
# manually add extra files to final prediction # extend the tables with the files not shown
MAX_EXTRA_FILES_TO_OUTPUT = 100 files_walkthrough_extended = await self.extend_uncovered_files(files_walkthrough)
if get_settings().pr_description.mention_extra_files:
for i, file in enumerate(remaining_files_list):
extra_file_yaml = f"""\
- filename: |
{file}
changes_summary: |
...
changes_title: |
...
label: |
additional files (token-limit)
"""
files_walkthrough = files_walkthrough.strip() + "\n" + extra_file_yaml.strip()
if i >= MAX_EXTRA_FILES_TO_OUTPUT:
files_walkthrough += f"""\
extra_file_yaml =
- filename: |
Additional {len(remaining_files_list) - MAX_EXTRA_FILES_TO_OUTPUT} files not shown
changes_summary: |
...
changes_title: |
...
label: |
additional files (token-limit)
"""
break
# final processing # final processing
self.prediction = prediction_headers + "\n" + "pr_files:\n" + files_walkthrough self.prediction = prediction_headers + "\n" + "pr_files:\n" + files_walkthrough_extended
if not load_yaml(self.prediction, keys_fix_yaml=self.keys_fix): if not load_yaml(self.prediction, keys_fix_yaml=self.keys_fix):
get_logger().error(f"Error getting valid YAML in large PR handling for describe {self.pr_id}") get_logger().error(f"Error getting valid YAML in large PR handling for describe {self.pr_id}")
if load_yaml(prediction_headers, keys_fix_yaml=self.keys_fix): if load_yaml(prediction_headers, keys_fix_yaml=self.keys_fix):
get_logger().debug(f"Using only headers for describe {self.pr_id}") get_logger().debug(f"Using only headers for describe {self.pr_id}")
self.prediction = prediction_headers self.prediction = prediction_headers
async def extend_uncovered_files(self, original_prediction: str) -> str:
try:
prediction = original_prediction
# get the original prediction filenames
original_prediction_loaded = load_yaml(original_prediction, keys_fix_yaml=self.keys_fix)
if isinstance(original_prediction_loaded, list):
original_prediction_dict = {"pr_files": original_prediction_loaded}
else:
original_prediction_dict = original_prediction_loaded
filenames_predicted = [file['filename'].strip() for file in original_prediction_dict.get('pr_files', [])]
# extend the prediction with additional files not included in the original prediction
pr_files = self.git_provider.get_diff_files()
prediction_extra = "pr_files:"
MAX_EXTRA_FILES_TO_OUTPUT = 100
counter_extra_files = 0
for file in pr_files:
if file.filename in filenames_predicted:
continue
# add up to MAX_EXTRA_FILES_TO_OUTPUT files
counter_extra_files += 1
if counter_extra_files > MAX_EXTRA_FILES_TO_OUTPUT:
extra_file_yaml = f"""\
- filename: |
Additional files not shown
changes_title: |
...
label: |
additional files
"""
prediction_extra = prediction_extra + "\n" + extra_file_yaml.strip()
get_logger().debug(f"Too many remaining files, clipping to {MAX_EXTRA_FILES_TO_OUTPUT}")
break
extra_file_yaml = f"""\
- filename: |
{file.filename}
changes_title: |
...
label: |
additional files
"""
prediction_extra = prediction_extra + "\n" + extra_file_yaml.strip()
# merge the two dictionaries
if counter_extra_files > 0:
get_logger().info(f"Adding {counter_extra_files} unprocessed extra files to table prediction")
prediction_extra_dict = load_yaml(prediction_extra, keys_fix_yaml=self.keys_fix)
if isinstance(original_prediction_dict, dict) and isinstance(prediction_extra_dict, dict):
original_prediction_dict["pr_files"].extend(prediction_extra_dict["pr_files"])
new_yaml = yaml.dump(original_prediction_dict)
if load_yaml(new_yaml, keys_fix_yaml=self.keys_fix):
prediction = new_yaml
if isinstance(original_prediction, list):
prediction = yaml.dump(original_prediction_dict["pr_files"])
return prediction
except Exception as e:
get_logger().error(f"Error extending uncovered files {self.pr_id}: {e}")
return original_prediction
async def extend_additional_files(self, remaining_files_list) -> str: async def extend_additional_files(self, remaining_files_list) -> str:
prediction = self.prediction prediction = self.prediction
try: try:
@ -397,31 +442,31 @@ extra_file_yaml =
self.data['pr_files'] = self.data.pop('pr_files') self.data['pr_files'] = self.data.pop('pr_files')
def _prepare_labels(self) -> List[str]: def _prepare_labels(self) -> List[str]:
pr_types = [] pr_labels = []
# If the 'PR Type' key is present in the dictionary, split its value by comma and assign it to 'pr_types' # If the 'PR Type' key is present in the dictionary, split its value by comma and assign it to 'pr_types'
if 'labels' in self.data: if 'labels' in self.data and self.data['labels']:
if type(self.data['labels']) == list: if type(self.data['labels']) == list:
pr_types = self.data['labels'] pr_labels = self.data['labels']
elif type(self.data['labels']) == str: elif type(self.data['labels']) == str:
pr_types = self.data['labels'].split(',') pr_labels = self.data['labels'].split(',')
elif 'type' in self.data: elif 'type' in self.data and self.data['type'] and get_settings().pr_description.publish_labels:
if type(self.data['type']) == list: if type(self.data['type']) == list:
pr_types = self.data['type'] pr_labels = self.data['type']
elif type(self.data['type']) == str: elif type(self.data['type']) == str:
pr_types = self.data['type'].split(',') pr_labels = self.data['type'].split(',')
pr_types = [label.strip() for label in pr_types] pr_labels = [label.strip() for label in pr_labels]
# convert lowercase labels to original case # convert lowercase labels to original case
try: try:
if "labels_minimal_to_labels_dict" in self.variables: if "labels_minimal_to_labels_dict" in self.variables:
d: dict = self.variables["labels_minimal_to_labels_dict"] d: dict = self.variables["labels_minimal_to_labels_dict"]
for i, label_i in enumerate(pr_types): for i, label_i in enumerate(pr_labels):
if label_i in d: if label_i in d:
pr_types[i] = d[label_i] pr_labels[i] = d[label_i]
except Exception as e: except Exception as e:
get_logger().error(f"Error converting labels to original case {self.pr_id}: {e}") get_logger().error(f"Error converting labels to original case {self.pr_id}: {e}")
return pr_types return pr_labels
def _prepare_pr_answer_with_markers(self) -> Tuple[str, str, str, List[dict]]: def _prepare_pr_answer_with_markers(self) -> Tuple[str, str, str, List[dict]]:
get_logger().info(f"Using description marker replacements {self.pr_id}") get_logger().info(f"Using description marker replacements {self.pr_id}")
@ -511,6 +556,11 @@ extra_file_yaml =
elif 'pr_files' in key.lower() and get_settings().pr_description.enable_semantic_files_types: elif 'pr_files' in key.lower() and get_settings().pr_description.enable_semantic_files_types:
changes_walkthrough, pr_file_changes = self.process_pr_files_prediction(changes_walkthrough, value) changes_walkthrough, pr_file_changes = self.process_pr_files_prediction(changes_walkthrough, value)
changes_walkthrough = f"{PRDescriptionHeader.CHANGES_WALKTHROUGH.value}\n{changes_walkthrough}" changes_walkthrough = f"{PRDescriptionHeader.CHANGES_WALKTHROUGH.value}\n{changes_walkthrough}"
elif key.lower().strip() == 'description':
if isinstance(value, list):
value = ', '.join(v.rstrip() for v in value)
value = value.replace('\n-', '\n\n-').strip() # makes the bullet points more readable by adding double space
pr_body += f"{value}\n"
else: else:
# if the value is a list, join its items by comma # if the value is a list, join its items by comma
if isinstance(value, list): if isinstance(value, list):
@ -528,14 +578,18 @@ extra_file_yaml =
return file_label_dict return file_label_dict
for file in self.data['pr_files']: for file in self.data['pr_files']:
try: try:
required_fields = ['changes_summary', 'changes_title', 'filename', 'label'] required_fields = ['changes_title', 'filename', 'label']
if not all(field in file for field in required_fields): if not all(field in file for field in required_fields):
# can happen for example if a YAML generation was interrupted in the middle (no more tokens) # can happen for example if a YAML generation was interrupted in the middle (no more tokens)
get_logger().warning(f"Missing required fields in file label dict {self.pr_id}, skipping file", get_logger().warning(f"Missing required fields in file label dict {self.pr_id}, skipping file",
artifact={"file": file}) artifact={"file": file})
continue continue
if not file.get('changes_title'):
get_logger().warning(f"Empty changes title or summary in file label dict {self.pr_id}, skipping file",
artifact={"file": file})
continue
filename = file['filename'].replace("'", "`").replace('"', '`') filename = file['filename'].replace("'", "`").replace('"', '`')
changes_summary = file['changes_summary'] changes_summary = file.get('changes_summary', "").strip()
changes_title = file['changes_title'].strip() changes_title = file['changes_title'].strip()
label = file.get('label').strip().lower() label = file.get('label').strip().lower()
if label not in file_label_dict: if label not in file_label_dict:
@ -578,12 +632,14 @@ extra_file_yaml =
for filename, file_changes_title, file_change_description in list_tuples: for filename, file_changes_title, file_change_description in list_tuples:
filename = filename.replace("'", "`").rstrip() filename = filename.replace("'", "`").rstrip()
filename_publish = filename.split("/")[-1] filename_publish = filename.split("/")[-1]
if file_changes_title and file_changes_title.strip() != "...":
file_changes_title_code = f"<code>{file_changes_title}</code>" file_changes_title_code = f"<code>{file_changes_title}</code>"
file_changes_title_code_br = insert_br_after_x_chars(file_changes_title_code, x=(delta - 5)).strip() file_changes_title_code_br = insert_br_after_x_chars(file_changes_title_code, x=(delta - 5)).strip()
if len(file_changes_title_code_br) < (delta - 5): if len(file_changes_title_code_br) < (delta - 5):
file_changes_title_code_br += "&nbsp; " * ((delta - 5) - len(file_changes_title_code_br)) file_changes_title_code_br += "&nbsp; " * ((delta - 5) - len(file_changes_title_code_br))
filename_publish = f"<strong>{filename_publish}</strong><dd>{file_changes_title_code_br}</dd>" filename_publish = f"<strong>{filename_publish}</strong><dd>{file_changes_title_code_br}</dd>"
else:
filename_publish = f"<strong>{filename_publish}</strong>"
diff_plus_minus = "" diff_plus_minus = ""
delta_nbsp = "" delta_nbsp = ""
diff_files = self.git_provider.get_diff_files() diff_files = self.git_provider.get_diff_files()
@ -592,6 +648,8 @@ extra_file_yaml =
num_plus_lines = f.num_plus_lines num_plus_lines = f.num_plus_lines
num_minus_lines = f.num_minus_lines num_minus_lines = f.num_minus_lines
diff_plus_minus += f"+{num_plus_lines}/-{num_minus_lines}" diff_plus_minus += f"+{num_plus_lines}/-{num_minus_lines}"
if len(diff_plus_minus) > 12 or diff_plus_minus == "+0/-0":
diff_plus_minus = "[link]"
delta_nbsp = "&nbsp; " * max(0, (8 - len(diff_plus_minus))) delta_nbsp = "&nbsp; " * max(0, (8 - len(diff_plus_minus)))
break break
@ -600,9 +658,40 @@ extra_file_yaml =
if hasattr(self.git_provider, 'get_line_link'): if hasattr(self.git_provider, 'get_line_link'):
filename = filename.strip() filename = filename.strip()
link = self.git_provider.get_line_link(filename, relevant_line_start=-1) link = self.git_provider.get_line_link(filename, relevant_line_start=-1)
if (not link or not diff_plus_minus) and ('additional files' not in filename.lower()):
get_logger().warning(f"Error getting line link for '{filename}'")
continue
# Add file data to the PR body
file_change_description_br = insert_br_after_x_chars(file_change_description, x=(delta - 5)) file_change_description_br = insert_br_after_x_chars(file_change_description, x=(delta - 5))
pr_body += f""" pr_body = self.add_file_data(delta_nbsp, diff_plus_minus, file_change_description_br, filename,
filename_publish, link, pr_body)
# Close the collapsible file list
if use_collapsible_file_list:
pr_body += """</table></details></td></tr>"""
else:
pr_body += """</table></td></tr>"""
pr_body += """</tr></tbody></table>"""
except Exception as e:
get_logger().error(f"Error processing pr files to markdown {self.pr_id}: {str(e)}")
pass
return pr_body, pr_comments
def add_file_data(self, delta_nbsp, diff_plus_minus, file_change_description_br, filename, filename_publish, link,
pr_body) -> str:
if not file_change_description_br:
pr_body += f"""
<tr>
<td>{filename_publish}</td>
<td><a href="{link}">{diff_plus_minus}</a>{delta_nbsp}</td>
</tr>
"""
else:
pr_body += f"""
<tr> <tr>
<td> <td>
<details> <details>
@ -622,17 +711,7 @@ extra_file_yaml =
</tr> </tr>
""" """
if use_collapsible_file_list: return pr_body
pr_body += """</table></details></td></tr>"""
else:
pr_body += """</table></td></tr>"""
pr_body += """</tr></tbody></table>"""
except Exception as e:
get_logger().error(f"Error processing pr files to markdown {self.pr_id}: {e}")
pass
return pr_body, pr_comments
def count_chars_without_html(string): def count_chars_without_html(string):
if '<' not in string: if '<' not in string:
@ -641,11 +720,14 @@ def count_chars_without_html(string):
return len(no_html_string) return len(no_html_string)
def insert_br_after_x_chars(text, x=70): def insert_br_after_x_chars(text: str, x=70):
""" """
Insert <br> into a string after a word that increases its length above x characters. Insert <br> into a string after a word that increases its length above x characters.
Use proper HTML tags for code and new lines. Use proper HTML tags for code and new lines.
""" """
if not text:
return ""
if count_chars_without_html(text) < x: if count_chars_without_html(text) < x:
return text return text

View File

@ -79,13 +79,17 @@ class PR_LineQuestions:
line_end=line_end, line_end=line_end,
side=side) side=side)
if self.patch_with_lines: if self.patch_with_lines:
response = await retry_with_fallback_models(self._get_prediction, model_type=ModelType.WEAK) model_answer = await retry_with_fallback_models(self._get_prediction, model_type=ModelType.WEAK)
# sanitize the answer so that no line will start with "/"
model_answer_sanitized = model_answer.strip().replace("\n/", "\n /")
if model_answer_sanitized.startswith("/"):
model_answer_sanitized = " " + model_answer_sanitized
get_logger().info('Preparing answer...') get_logger().info('Preparing answer...')
if comment_id: if comment_id:
self.git_provider.reply_to_comment_from_comment_id(comment_id, response) self.git_provider.reply_to_comment_from_comment_id(comment_id, model_answer_sanitized)
else: else:
self.git_provider.publish_comment(response) self.git_provider.publish_comment(model_answer_sanitized)
return "" return ""

View File

@ -117,6 +117,16 @@ class PRQuestions:
return response return response
def _prepare_pr_answer(self) -> str: def _prepare_pr_answer(self) -> str:
model_answer = self.prediction.strip()
# sanitize the answer so that no line will start with "/"
model_answer_sanitized = model_answer.replace("\n/", "\n /")
if model_answer_sanitized.startswith("/"):
model_answer_sanitized = " " + model_answer_sanitized
if model_answer_sanitized != model_answer:
get_logger().debug(f"Sanitized model answer",
artifact={"model_answer": model_answer, "sanitized_answer": model_answer_sanitized})
answer_str = f"### **Ask**❓\n{self.question_str}\n\n" answer_str = f"### **Ask**❓\n{self.question_str}\n\n"
answer_str += f"### **Answer:**\n{self.prediction.strip()}\n\n" answer_str += f"### **Answer:**\n{model_answer_sanitized}\n\n"
return answer_str return answer_str

View File

@ -24,6 +24,13 @@ class TestBitbucketServerProvider:
assert repo_slug == "my-repo" assert repo_slug == "my-repo"
assert pr_number == 1 assert pr_number == 1
def test_parse_pr_url_with_users(self):
url = "https://bitbucket.company-server.url/users/username/repos/my-repo/pull-requests/1"
workspace_slug, repo_slug, pr_number = BitbucketServerProvider._parse_pr_url(url)
assert workspace_slug == "~username"
assert repo_slug == "my-repo"
assert pr_number == 1
def mock_get_content_of_file(self, project_key, repository_slug, filename, at=None, markup=None): def mock_get_content_of_file(self, project_key, repository_slug, filename, at=None, markup=None):
content_map = { content_map = {
'9c1cffdd9f276074bfb6fb3b70fbee62d298b058': 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile\n', '9c1cffdd9f276074bfb6fb3b70fbee62d298b058': 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile\n',

View File

@ -145,8 +145,8 @@ class TestExtendedPatchMoreLines:
# Check that with no extra lines, the patches are the same as the original patches # Check that with no extra lines, the patches are the same as the original patches
p0 = patches_extended_no_extra_lines[0].strip() p0 = patches_extended_no_extra_lines[0].strip()
p1 = patches_extended_no_extra_lines[1].strip() p1 = patches_extended_no_extra_lines[1].strip()
assert p0 == '## file1\n' + pr_languages[0]['files'][0].patch.strip() assert p0 == "## File: 'file1'\n" + pr_languages[0]['files'][0].patch.strip()
assert p1 == '## file2\n' + pr_languages[0]['files'][1].patch.strip() assert p1 == "## File: 'file2'\n" + pr_languages[0]['files'][1].patch.strip()
patches_extended_with_extra_lines, total_tokens, patches_extended_tokens = pr_generate_extended_diff( patches_extended_with_extra_lines, total_tokens, patches_extended_tokens = pr_generate_extended_diff(
pr_languages, token_handler, add_line_numbers_to_hunks=False, pr_languages, token_handler, add_line_numbers_to_hunks=False,
@ -154,5 +154,6 @@ class TestExtendedPatchMoreLines:
patch_extra_lines_after=1 patch_extra_lines_after=1
) )
p0_extended = patches_extended_with_extra_lines[0].strip() p0_extended = patches_extended_with_extra_lines[0].strip()
assert p0_extended == '## file1\n\n@@ -3,8 +3,8 @@ \n line0\n line1\n-original content\n+modified content\n line2\n line3\n line4\n line5\n line6' assert p0_extended == "## File: 'file1'\n\n@@ -3,8 +3,8 @@ \n line0\n line1\n-original content\n+modified content\n line2\n line3\n line4\n line5\n line6"