Update code suggestion evaluation criteria and line number descriptions

2025-07-21 04:50:39 +08:00 · 2025-03-11 16:50:42 +02:00
parent d16012a568
commit 4713175fcf
1 changed files with 5 additions and 3 deletions
--- a/pr_agent/settings/code_suggestions/pr_code_suggestions_reflect_prompts.toml
+++ b/pr_agent/settings/code_suggestions/pr_code_suggestions_reflect_prompts.toml
@ -2,7 +2,7 @@
 system="""You are an AI language model specialized in reviewing and evaluating code suggestions for a Pull Request (PR).
 Your task is to analyze a PR code diff and evaluate a set of AI-generated code suggestions. These suggestions aim to address potential bugs and problems, and enhance the new code introduced in the PR.
-Examine each suggestion meticulously, assessing its quality, relevance, and accuracy within the context of PR. Keep in mind that the suggestions may vary in their correctness and accuracy. Your evaluation should be based on a thorough comparison between each suggestion and the actual PR code diff.
+Examine each suggestion meticulously, assessing its quality, relevance, and accuracy within the context of PR. Keep in mind that the suggestions may vary in their correctness, accuracy and impact.
 Consider the following components of each suggestion:
    1. 'one_sentence_summary' - A brief summary of the suggestion's purpose
    2. 'suggestion_content' - The detailed suggestion content, explaining the proposed modification
@ -31,9 +31,11 @@ Key guidelines for evaluation:
 Additional scoring considerations:
 - If the suggestion is not actionable, and only asks the user to verify or ensure a change, reduce its score by 1-2 points.
 - Error handling or type checking suggestions should not receive a score above 8 (and may be lower).
 - Assign a score of 0 to suggestions aiming at:
   - Adding docstring, type hints, or comments
   - Remove unused imports or variables
   - Add missing import statements
   - Using more specific exception types.
@ -82,8 +84,8 @@ The output must be a YAML object equivalent to type $PRCodeSuggestionsFeedback,
 class CodeSuggestionFeedback(BaseModel):
    suggestion_summary: str = Field(description="Repeated from the input")
    relevant_file: str = Field(description="Repeated from the input")
-    relevant_lines_start: int = Field(description="The relevant line number, from a '__new hunk__' section, where the suggestion starts (inclusive). Should be derived from the hunk line numbers, and correspond to the beginning of the relevant 'existing code' snippet")
+    relevant_lines_start: int = Field(description="The relevant line number, from a '__new hunk__' section, where the suggestion starts (inclusive). Should be derived from the added '__new hunk__' line numbers, and correspond to the first line of the relevant 'existing code' snippet.")
-    relevant_lines_end: int = Field(description="The relevant line number, from a '__new hunk__' section, where the suggestion ends (inclusive). Should be derived from the hunk line numbers, and correspond to the end of the relevant 'existing code' snippet")
+    relevant_lines_end: int = Field(description="The relevant line number, from a '__new hunk__' section, where the suggestion ends (inclusive). Should be derived from the added '__new hunk__' line numbers, and correspond to the end of the relevant 'existing code' snippet")
    suggestion_score: int = Field(description="Evaluate the suggestion and assign a score from 0 to 10. Give 0 if the suggestion is wrong. For valid suggestions, score from 1 (lowest impact/importance) to 10 (highest impact/importance).")
    why: str = Field(description="Briefly explain the score given in 1-2 sentences, focusing on the suggestion's impact, relevance, and accuracy.")