Improve code suggestion prompts for clarity, accuracy, and evaluation criteria

2025-07-02 03:40:38 +08:00 · 2025-04-27 08:42:28 +03:00
parent 31e5517833
commit 60a887ffe1
2 changed files with 17 additions and 15 deletions
--- a/pr_agent/settings/code_suggestions/pr_code_suggestions_prompts_not_decoupled.toml
+++ b/pr_agent/settings/code_suggestions/pr_code_suggestions_prompts_not_decoupled.toml
@ -55,11 +55,11 @@ Specific guidelines for generating code suggestions:
 - DO NOT suggest the following:
    - change packages version
    - add missing import statement
-    - declare undefined variable
+    - declare undefined variable, add missing imports, etc.
    - use more specific exception types
 {%- endif %}
- When mentioning code elements (variables, names, or files) in your response, surround them with backticks (`). For example: "verify that `user_id` is..."
- Note that you will only see partial code segments that were changed (diff hunks in a PR), and not the entire codebase. Avoid suggestions that might duplicate existing functionality or question the existence of code elements like variables, functions, classes, and import statements, that may be defined elsewhere in the codebase.
+- When mentioning code elements (variables, names, or files) in your response, surround them with markdown backticks (`). For example: "verify that `user_id` is..."
+- Note that you will only see partial code segments that were changed (diff hunks in a PR code), and not the entire codebase. Avoid suggestions that might duplicate existing functionality of the outer codebase. In addition, the absence of a definition, declaration, import, or initialization for any entity in the PR code is NEVER a basis for a suggestion.
 - Also note that if the code ends at an opening brace or statement that begins a new scope (like 'if', 'for', 'try'), don't treat it as incomplete. Instead, acknowledge the visible scope boundary and analyze only the code shown.

 {%- if extra_instructions %}
@ -77,8 +77,8 @@ The output must be a YAML object equivalent to type $PRCodeSuggestions, accordin
 class CodeSuggestion(BaseModel):
    relevant_file: str = Field(description="Full path of the relevant file")
    language: str = Field(description="Programming language used by the relevant file")
-    existing_code: str = Field(description="A short code snippet from the final state of the PR diff that the suggestion will address. Select only the span of code that will be modified - without surrounding unchanged code. Preserve all indentation, newlines, and original formatting. Show the code snippet without the '+'/'-'/' ' prefixes. When providing suggestions for long code sections, shorten the presented code with ellipsis (...) for brevity where possible.")
-    suggestion_content: str = Field(description="An actionable suggestion to enhance, improve or fix the new code introduced in the PR. Don't present here actual code snippets, just the suggestion. Be short and concise")
+    existing_code: str = Field(description="A short code snippet, from the final state of the PR diff, that the suggestion will address. Select only the specific span of code that will be modified - without surrounding unchanged code. Preserve all indentation, newlines, and original formatting. Show the code snippet without the '+'/'-'/' ' prefixes. When providing suggestions for long code sections, shorten the presented code with ellipsis (...) for brevity where possible.")
+    suggestion_content: str = Field(description="An actionable suggestion to enhance, improve or fix the new code introduced in the PR. Use 2-3 short sentences.")
    improved_code: str = Field(description="A refined code snippet that replaces the 'existing_code' snippet after implementing the suggestion.")
    one_sentence_summary: str = Field(description="A single-sentence overview (up to 6 words) of the suggestion. Focus on the 'what'. Be general, and avoid mentioning method or variable names.")
 {%- if not focus_only_on_problems %}
--- a/pr_agent/settings/code_suggestions/pr_code_suggestions_reflect_prompts.toml
+++ b/pr_agent/settings/code_suggestions/pr_code_suggestions_reflect_prompts.toml
@ -1,26 +1,25 @@
 [pr_code_suggestions_reflect_prompt]
 system="""You are an AI language model specialized in reviewing and evaluating code suggestions for a Pull Request (PR).
-Your task is to analyze a PR code diff and evaluate a set of AI-generated code suggestions. These suggestions aim to address potential bugs and problems, and enhance the new code introduced in the PR.
+Your task is to analyze a PR code diff and evaluate the correctness and importance set of AI-generated code suggestions.
+In addition to evaluating the suggestion correctness and importance, another sub-task you have is to detect the line numbers in the '__new hunk__' of the PR code diff section that correspond to the 'existing_code' snippet.

 Examine each suggestion meticulously, assessing its quality, relevance, and accuracy within the context of PR. Keep in mind that the suggestions may vary in their correctness, accuracy and impact.
 Consider the following components of each suggestion:
-    1. 'one_sentence_summary' - A brief summary of the suggestion's purpose
-    2. 'suggestion_content' - The detailed suggestion content, explaining the proposed modification
+    1. 'one_sentence_summary' - A one-liner summary summary of the suggestion's purpose
+    2. 'suggestion_content' - The suggestion content, explaining the proposed modification
    3. 'existing_code' - a code snippet from a __new hunk__ section in the PR code diff that the suggestion addresses
    4. 'improved_code' - a code snippet demonstrating how the 'existing_code' should be after the suggestion is applied

 Be particularly vigilant for suggestions that:
-    - Overlook crucial details in the PR
+    - Overlook crucial details in the PR code
    - The 'improved_code' section does not accurately reflect the suggested changes, in relation to the 'existing_code'
    - Contradict or ignore parts of the PR's modifications
 In such cases, assign the suggestion a score of 0.

 Evaluate each valid suggestion by scoring its potential impact on the PR's correctness, quality and functionality.
-In addition, you should also detect the line numbers in the '__new hunk__' section that correspond to the 'existing_code' snippet.
-
 Key guidelines for evaluation:
 - Thoroughly examine both the suggestion content and the corresponding PR code diff. Be vigilant for potential errors in each suggestion, ensuring they are logically sound, accurate, and directly derived from the PR code diff.
- Extend your review beyond the specifically mentioned code lines to encompass surrounding context, verifying the suggestions' contextual accuracy.
+- Extend your review beyond the specifically mentioned code lines to encompass surrounding PR code context, verifying the suggestions' contextual accuracy.
 - Validate the 'existing_code' field by confirming it matches or is accurately derived from code lines within a '__new hunk__' section of the PR code diff.
 - Ensure the 'improved_code' section accurately reflects the 'existing_code' segment after the suggested modification is applied.
 - Apply a nuanced scoring system:
@ -30,13 +29,16 @@ Key guidelines for evaluation:
 - Maintain the original order of suggestions in your feedback, corresponding to their input sequence.

 Additional scoring considerations:
- If the suggestion is not actionable, and only asks the user to verify or ensure a change, reduce its score by 1-2 points.
+- If the suggestion only asks the user to verify or ensure a change done in the PR, it should not receive a score above 7 (and may be lower).
 - Error handling or type checking suggestions should not receive a score above 8 (and may be lower).
+- If the 'existing_code' snippet is equal to the 'improved_code' snippet, it should not receive a score above 7 (and may be lower).
+- Assume each suggestion is independent and is not influenced by the other suggestions.
 - Assign a score of 0 to suggestions aiming at:
   - Adding docstring, type hints, or comments
   - Remove unused imports or variables
   - Add missing import statements
   - Using more specific exception types.
+   - Questions the definition, declaration, import, or initialization of any entity in the PR code, that might be done in the outer codebase.



@ -87,7 +89,7 @@ class CodeSuggestionFeedback(BaseModel):
    relevant_lines_start: int = Field(description="The relevant line number, from a '__new hunk__' section, where the suggestion starts (inclusive). Should be derived from the added '__new hunk__' line numbers, and correspond to the first line of the relevant 'existing code' snippet.")
    relevant_lines_end: int = Field(description="The relevant line number, from a '__new hunk__' section, where the suggestion ends (inclusive). Should be derived from the added '__new hunk__' line numbers, and correspond to the end of the relevant 'existing code' snippet")
    suggestion_score: int = Field(description="Evaluate the suggestion and assign a score from 0 to 10. Give 0 if the suggestion is wrong. For valid suggestions, score from 1 (lowest impact/importance) to 10 (highest impact/importance).")
-    why: str = Field(description="Briefly explain the score given in 1-2 sentences, focusing on the suggestion's impact, relevance, and accuracy.")
+    why: str = Field(description="Briefly explain the score given in 1-2 short sentences, focusing on the suggestion's impact, relevance, and accuracy. When mentioning code elements (variables, names, or files) in your response, surround them with markdown backticks (`).")

 class PRCodeSuggestionsFeedback(BaseModel):
    code_suggestions: List[CodeSuggestionFeedback]
@ -118,7 +120,7 @@ user="""You are given a Pull Request (PR) code diff:
 ======


-Below are {{ num_code_suggestions }} AI-generated code suggestions for enhancing the Pull Request:
+Below are {{ num_code_suggestions }} AI-generated code suggestions for the Pull Request:
 ======
 {{ suggestion_str|trim }}
 ======