Initial commit - PR-Agent OSS release

2025-07-21 04:50:39 +08:00 · 2023-07-06 00:21:08 +03:00
commit 4b4d91dfe9
44 changed files with 2426 additions and 0 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -0,0 +1 @@
 venv/
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,4 @@
 .idea/
 venv/
 pr_agent/settings/.secrets.toml
 __pycache__
--- a/0
+++ b/0
--- a/202
+++ b/202
@ -0,0 +1,202 @@
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/
   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
   1. Definitions.
      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.
      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.
      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.
      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.
      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.
      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.
      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).
      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.
      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."
      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.
   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.
   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.
   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:
      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and
      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and
      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and
      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.
      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.
   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.
   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.
   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.
   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.
   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.
   END OF TERMS AND CONDITIONS
   APPENDIX: How to apply the Apache License to your work.
      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.
   Copyright [2023] [Codium ltd]
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
       http://www.apache.org/licenses/LICENSE-2.0
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--- a/README.md
+++ b/README.md
@ -0,0 +1,283 @@
 <div align="center">
 # 🛡️ CodiumAI PR-Agent
 [![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Codium-ai/pr-agent/blob/main/LICENSE)
 [![Discord](https://badgen.net/badge/icon/discord?icon=discord&label&color=purple)](https://discord.com/channels/1057273017547378788/1126104260430528613)
 CodiumAI `PR-Agent` is an open-source tool that helps developers review PRs faster and more efficiently. 
 It automatically analyzes the PR, and provides feedback and suggestions, and can answer questions. 
 It is powered by GPT-4, and is based on the [CodiumAI](https://github.com/Codium-ai/) platform.
 </div>
 TBD: Add screenshot of the PR Reviewer (could be gif)
 * [Quickstart](#Quickstart)
 * [Configuration](#Configuration)
 * [Usage and Tools](#usage-and-tools)
 * [Roadmap](#roadmap)
 * [Similar projects](#similar-projects)
 * Additional files:
  * CONTRIBUTION.md
  * LICENSE
  * 
 ## Quickstart
 To get started with PR-Agent quickly, you first need to acquire two tokens:
 1. An OpenAI key from [here](https://platform.openai.com/), with access to GPT-4.
 2. A GitHub personal access token (classic) with the repo scope.
 There are several ways to use PR-Agent. Let's start with the simplest one:
 ---
 ### Method 1: Use Docker image (no installation required)
 To request a review for a PR, or ask a question about a PR, you can run the appropriate
 Python scripts from the scripts folder. Here's how:
 1. To request a review for a PR, run the following command:
 ```
 docker run --rm -it -e OPENAI.KEY=<your key> -e GITHUB.USER_TOKEN=<your token> codiumai/pr-agent \
 python pr_agent/scripts/review_pr_from_url.py --pr_url <pr url>
 ```
 ---
 2. To ask a question about a PR, run the following command:
 ```
 docker run --rm -it -e OPENAI.KEY -e GITHUB.USER_TOKEN codiumai/pr-agent \
 python pr_agent/scripts/answer_pr_questions_from_url.py --pr_url <pr url> --question "<your question>"
 ```
 Possible questions you can ask include:
 - What is the main theme of this PR?
 - Is the PR ready for merge?
 - What are the main changes in this PR?
 - Should this PR be split into smaller parts?
 - Can you compose a rhymed song about this PR.
 ---
 ### Method 2: Run from source
 1. Clone this repository:
 ```
 git clone https://github.com/Codium-ai/pr-agent.git
 ```
 2. Install the requirements in your favorite virtual environment:
 ```
 pip install -r requirements.txt
 ```
 3. Copy the secrets template file and fill in your OpenAI key and your GitHub user token:
 ```
 cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets
 # Edit .secrets file
 ```
 4. Run the appropriate Python scripts from the scripts folder:
 ```
 python pr_agent/scripts/review_pr_from_url.py --pr_url <pr url>
 python pr_agent/scripts/answer_pr_questions_from_url.py --pr_url <pr url> --question "<your question>"
 ```
 ---
 ### Method 3: Method 3: Run as a polling server; request reviews by tagging your Github user on a PR
 Follow steps 1-3 of method 2.
 Run the following command to start the server:
 ```
 python pr_agent/servers/github_polling.py
 ```
 ---
 ### Method 4: Run as a Github App, allowing you to automate the review process on your private or public repositories.
 1. Create a GitHub App from the [Github Developer Portal](https://docs.github.com/en/developers/apps/creating-a-github-app).
   - Set the following permissions:
     - Pull requests: Read & write
     - Issue comment: Read & write
     - Metadata: Read-only
   - Set the following events:
     - Issue comment
     - Pull request
 2. Generate a random secret for your app, and save it for later. For example, you can use:
 ```
 WEBHOOK_SECRET=$(python -c "import secrets; print(secrets.token_hex(10))")
 ```
 3. Acquire the following pieces of information from your app's settings page:
   - App private key (click "Generate a private key", and save the file)
   - App ID
 4. Clone this repository:
 ```
 git clone https://github.com/Codium-ai/pr-agent.git
 ```
 5. Copy the secrets template file and fill in the following:
   - Your OpenAI key.
   - Set deployment_type to 'app'
   - Copy your app's private key to the private_key field.
   - Copy your app's ID to the app_id field.
   - Copy your app's webhook secret to the webhook_secret field.
 ```
 cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets
 # Edit .secrets file
 ```
 6. Build a Docker image for the app and optionally push it to a Docker repository. We'll use Dockerhub as an example:
 ```
 docker build . -t codiumai/pr-agent:github_app --target github_app -f docker/Dockerfile
 docker push codiumai/pr-agent:github_app  # Push to your Docker repository
 ```
 7. Host the app using a server, serverless function, or container environment. Alternatively, for development and 
   debugging, you may use tools like smee.io to forward webhooks to your local machine. 
 8. Go back to your app's settings, set the following:
   - Webhook URL: The URL of your app's server, or the URL of the smee.io channel.
   - Webhook secret: The secret you generated earlier.
 9. Install the app by navigating to the "Install App" tab, and selecting your desired repositories.
 ---
 ## Usage and Tools
 CodiumAI PR-Agent provides two types of interactions ("tools"): `"PR Reviewer"` and `"PR Q&A"`.
 - The "PR Reviewer" tool automatically analyzes PRs, and provides different types of feedbacks.
 - The "PR Q&A" tool answers free-text questions about the PR.
 ### PR Reviewer
 Here is a quick overview of the different sub-tools of PR Reviewer:
 - PR Analysis
  - Summarize main theme
  - PR description and title
  - PR type classification
  - Is the PR covered by relevant tests
  - Is the PR minimal and focused
 - PR Feedback
  - General PR suggestions
  - Code suggestions
  - Security concerns
 This is how a typical output of the PR Reviewer looks like:
 ---
 #### PR Analysis
 - 🎯 **Main theme:** Adding language extension handler and token handler
 - 🔍 **Description and title:** Yes
 - 📌 **Type of PR:** Enhancement
 - 🧪 **Relevant tests added:** No
 - ✨ **Minimal and focused:** Yes, the PR is focused on adding two new handlers for language extension and token counting.
 #### PR Feedback
 - 💡 **General PR suggestions:** The PR is generally well-structured and the code is clean. However, it would be beneficial to add some tests to ensure the new handlers work as expected. Also, consider adding docstrings to the new functions and classes to improve code readability and maintainability.
 - 🤖 **Code suggestions:**
 - **suggestion 1:**
  - **relevant file:** pr_agent/algo/language_handler.py
  - **suggestion content:** Consider using a set instead of a list for 'bad_extensions' as checking membership in a set is faster than in a list. [medium]
 - **suggestion 2:**
  - **relevant file:** pr_agent/algo/language_handler.py
  - **suggestion content:** In the 'filter_bad_extensions' function, you are splitting the filename on '.' and taking the last element to get the extension. This might not work as expected if the filename contains multiple '.' characters. Consider using 'os.path.splitext' to get the file extension more reliably. [important]
 - 🔒 **Security concerns:** No, the PR does not introduce possible security concerns or issues.
 ---
 ### PR Q&A
 This tool answers free-text questions about the PR. This is how a typical output of the PR Q&A looks like:
 ---
 **Question**: summarize for me the PR in 4 bullet points
 **Answer**: 
 - The PR introduces a new feature to sort files by their main languages. It uses a mapping of programming languages to their file extensions to achieve this.
 - It also introduces a filter to exclude files with certain extensions, deemed as 'bad extensions', from the sorting process.
 - The PR modifies the `get_pr_diff` function in `pr_processing.py` to use the new sorting function. It also refactors the code to move the PR pruning logic into a separate function.
 - A new `TokenHandler` class is introduced in `token_handler.py` to handle token counting operations. This class is initialized with a PR, variables, system, and user, and provides methods to get system and user tokens and to count tokens in a patch.
 ---
 ## Configuration
 The different tools and sub-tools used by CodiumAI PR-Agent are easily configurable via the configuration file: `/settings/configuration.toml`.
 #### Enabling/disabling sub-tools:
 You can enable/disable the different PR Reviewer sub-sections  with the following flags:
 ```
 require_minimal_and_focused_review=true
 require_tests_review=true
 require_security_review=true
 ```
 #### Code Suggestions configuration:
 There are also configuration options to control different aspects of the `code suggestions` feature.
 The number of suggestions provided can be controlled by adjusting the following parameter:
 ```
 num_code_suggestions=4
 ```
 You can also enable more verbose and informative mode of code suggestions:
 ```
 extended_code_suggestions=false
 ``` 
 This is a comparison of the regular and extended code suggestions modes:
 ---
 Example for regular suggestion:
 - **suggestion 1:**
  - **relevant file:** sql.py
  - **suggestion content:** Remove hardcoded sensitive information like username and password. Use environment variables or a secure method to store these values. [important]
 ---
 Example for extended suggestion:
 - **suggestion 1:**
  - **relevant file:** sql.py
  - **suggestion content:** Remove hardcoded sensitive information (username and password) [important]
  - **why:** Hardcoding sensitive information is a security risk. It's better to use environment variables or a secure way to store these values.
  - **code example:**
    - **before code:**
        ```
        user = "root",
        password = "Mysql@123",
        ```
    - **after code:**
        ```
        user = os.getenv('DB_USER'),
        password = os.getenv('DB_PASSWORD'),
        ```
 ---
 ## Roadmap
 - [ ] Support open-source models, as a replacement for openai models. Note that a minimal requirement for each open-source model is to have 8k+ context, and good support for generating json as an output
 - [ ] Support other Git providers, such as Gitlab and Bitbucket.
 - [ ] Develop additional logics for handling large PRs, and compressing git patches
 - [ ] Dedicated tools and sub-tools for specific programming languages (Python, Javascript, Java, C++, etc)
 - [ ] Add additional context to the prompt. For example, repo (or relevant files) summarization, with tools such a [ctags](https://github.com/universal-ctags/ctags)
 - [ ] Adding more tools. Possible directions:
  - [ ] Code Quality
  - [ ] Coding Style
  - [ ] Performance (are there any performance issues)
  - [ ] Documentation (is the PR properly documented)
  - [ ] Rank the PR importance
  - [ ] ...
 ## Similar Projects
 - [CodiumAI - Meaningful tests for busy devs](https://github.com/Codium-ai/codiumai-vscode-release)
 - [Aider - GPT powered coding in your terminal](https://github.com/paul-gauthier/aider)
 - [GPT-Engineer](https://github.com/AntonOsika/gpt-engineer)
 - [CodeReview BOT](https://github.com/anc95/ChatGPT-CodeReview)
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@ -0,0 +1,20 @@
 FROM python:3.10 as base
 WORKDIR /app
 ADD requirements.txt .
 RUN pip install -r requirements.txt && rm requirements.txt
 ENV PYTHONPATH=/app
 ADD pr_agent pr_agent
 FROM base as github_app
 CMD ["python", "servers/github_app.py"]
 FROM base as github_polling
 CMD ["python", "servers/github_polling.py"]
 FROM base as test
 ADD requirements-dev.txt .
 RUN pip install -r requirements-dev.txt && rm requirements-dev.txt
 FROM base as cli
 CMD ["bash"]
--- a/pics/extended_code_suggestion.png
+++ b/pics/extended_code_suggestion.png
--- a/pics/pr_questions.png
+++ b/pics/pr_questions.png
--- a/pics/pr_reviewer.png
+++ b/pics/pr_reviewer.png
--- a/pics/regular_code_suggestion.png
+++ b/pics/regular_code_suggestion.png
--- a/pr_agent/init.py
+++ b/pr_agent/init.py
@ -0,0 +1 @@
--- a/pr_agent/agent/init.py
+++ b/pr_agent/agent/init.py
--- a/pr_agent/agent/pr_agent.py
+++ b/pr_agent/agent/pr_agent.py
@ -0,0 +1,20 @@
 import re
 from typing import Optional
 from pr_agent.tools.pr_questions import PRQuestions
 from pr_agent.tools.pr_reviewer import PRReviewer
 class PRAgent:
    def __init__(self, installation_id: Optional[int] = None):
        self.installation_id = installation_id
    async def handle_request(self, pr_url, request):
        if 'please review' in request.lower():
            reviewer = PRReviewer(pr_url, self.installation_id)
            await reviewer.review()
        elif 'please answer' in request.lower():
            question = re.split(r'(?i)please answer', request)[1].strip()
            answerer = PRQuestions(pr_url, question, self.installation_id)
            await answerer.answer()
--- a/pr_agent/algo/init.py
+++ b/pr_agent/algo/init.py
@ -0,0 +1,10 @@
 MAX_TOKENS = {
    'gpt-3.5-turbo': 4000,
    'gpt-3.5-turbo-0613': 4000,
    'gpt-3.5-turbo-0301': 4000,
    'gpt-3.5-turbo-16k': 16000,
    'gpt-3.5-turbo-16k-0613': 16000,
    'gpt-4': 8000,
    'gpt-4-0613': 8000,
    'gpt-4-32k': 32000,
 }
--- a/pr_agent/algo/ai_handler.py
+++ b/pr_agent/algo/ai_handler.py
@ -0,0 +1,37 @@
 import logging
 import openai
 from openai.error import APIError, Timeout, TryAgain
 from retry import retry
 from pr_agent.config_loader import settings
 OPENAI_RETRIES=2
 class AiHandler:
    def __init__(self):
        try:
            openai.api_key = settings.openai.key
        except AttributeError as e:
            raise ValueError("OpenAI key is required") from e
    @retry(exceptions=(APIError, Timeout, TryAgain, AttributeError),
           tries=OPENAI_RETRIES, delay=2, backoff=2, jitter=(1, 3))
    async def chat_completion(self, model: str, temperature: float, system: str, user: str):
        try:
            response = await openai.ChatCompletion.acreate(
                            model=model,
                            messages=[
                                {"role": "system", "content": system},
                                {"role": "user", "content": user}
                            ],
                            temperature=temperature,
                        )
        except (APIError, Timeout, TryAgain) as e:
            logging.error("Error during OpenAI inference: ", e)
            raise
        if response is None or len(response.choices) == 0:
            raise TryAgain
        resp = response.choices[0]['message']['content']
        finish_reason = response.choices[0].finish_reason
        return resp, finish_reason
--- a/pr_agent/algo/git_patch_processing.py
+++ b/pr_agent/algo/git_patch_processing.py
@ -0,0 +1,107 @@
 from __future__ import annotations
 import logging
 import re
 from pr_agent.config_loader import settings
 def extend_patch(original_file_str, patch_str, num_lines) -> str:
    """
    Extends the patch to include 'num_lines' more surrounding lines
    """
    if not patch_str or num_lines == 0:
        return patch_str
    original_lines = original_file_str.splitlines()
    patch_lines = patch_str.splitlines()
    extended_patch_lines = []
    start1, size1, start2, size2 = -1, -1, -1, -1
    RE_HUNK_HEADER = re.compile(
        r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
    try:
        for line in patch_lines:
            if line.startswith('@@'):
                match = RE_HUNK_HEADER.match(line)
                if match:
                    # finish previous hunk
                    if start1 != -1:
                        extended_patch_lines.extend(
                            original_lines[start1 + size1 - 1:start1 + size1 - 1 + num_lines])
                    start1, size1, start2, size2 = map(int, match.groups()[:4])
                    section_header = match.groups()[4]
                    extended_start1 = max(1, start1 - num_lines)
                    extended_size1 = size1 + (start1 - extended_start1) + num_lines
                    extended_start2 = max(1, start2 - num_lines)
                    extended_size2 = size2 + (start2 - extended_start2) + num_lines
                    extended_patch_lines.append(
                        f'@@ -{extended_start1},{extended_size1} '
                        f'+{extended_start2},{extended_size2} @@ {section_header}')
                    extended_patch_lines.extend(
                        original_lines[extended_start1 - 1:start1 - 1])  # one to zero based
                    continue
            extended_patch_lines.append(line)
    except Exception as e:
        if settings.config.verbosity_level >= 2:
            logging.error(f"Failed to extend patch: {e}")
        return patch_str
    # finish previous hunk
    if start1 != -1:
        extended_patch_lines.extend(
            original_lines[start1 + size1 - 1:start1 + size1 - 1 + num_lines])
    extended_patch_str = '\n'.join(extended_patch_lines)
    return extended_patch_str
 def omit_deletion_hunks(patch_lines) -> str:
    temp_hunk = []
    added_patched = []
    add_hunk = False
    inside_hunk = False
    RE_HUNK_HEADER = re.compile(
        r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))?\ @@[ ]?(.*)")
    for line in patch_lines:
        if line.startswith('@@'):
            match = RE_HUNK_HEADER.match(line)
            if match:
                # finish previous hunk
                if inside_hunk and add_hunk:
                    added_patched.extend(temp_hunk)
                    temp_hunk = []
                    add_hunk = False
                temp_hunk.append(line)
                inside_hunk = True
        else:
            temp_hunk.append(line)
            edit_type = line[0]
            if edit_type == '+':
                add_hunk = True
    if inside_hunk and add_hunk:
        added_patched.extend(temp_hunk)
    return '\n'.join(added_patched)
 def handle_patch_deletions(patch: str, original_file_content_str: str,
                           new_file_content_str: str, file_name: str) -> str:
    """
    Handle entire file or deletion patches
    """
    if not new_file_content_str:
        # logic for handling deleted files - don't show patch, just show that the file was deleted
        if settings.config.verbosity_level > 0:
            logging.info(f"Processing file: {file_name}, minimizing deletion file")
        patch = "File was deleted\n"
    else:
        patch_lines = patch.splitlines()
        patch_new = omit_deletion_hunks(patch_lines)
        if patch != patch_new:
            if settings.config.verbosity_level > 0:
                logging.info(f"Processing file: {file_name}, hunks were deleted")
            patch = patch_new
    return patch
--- a/pr_agent/algo/language_handler.py
+++ b/pr_agent/algo/language_handler.py
--- a/pr_agent/algo/pr_processing.py
+++ b/pr_agent/algo/pr_processing.py
@ -0,0 +1,128 @@
 from __future__ import annotations
 import difflib
 import logging
 from typing import Any, Dict, Tuple
 from pr_agent.algo.git_patch_processing import extend_patch, handle_patch_deletions
 from pr_agent.algo.language_handler import sort_files_by_main_languages
 from pr_agent.algo.token_handler import TokenHandler
 from pr_agent.config_loader import settings
 from pr_agent.git_providers import GithubProvider
 OUTPUT_BUFFER_TOKENS = 800
 PATCH_EXTRA_LINES = 3
 def get_pr_diff(git_provider: [GithubProvider, Any], token_handler: TokenHandler) -> str:
    """
    Returns a string with the diff of the PR.
    If needed, apply diff minimization techniques to reduce the number of tokens
    """
    files = list(git_provider.get_diff_files())
    # get pr languages
    pr_languages = sort_files_by_main_languages(git_provider.get_languages(), files)
    # generate a standard diff string, with patch extension
    patches_extended, total_tokens = pr_generate_extended_diff(pr_languages, token_handler)
    # if we are under the limit, return the full diff
    if total_tokens + OUTPUT_BUFFER_TOKENS < token_handler.limit:
        return "\n".join(patches_extended)
    # if we are over the limit, start pruning
    patches_compressed = pr_generate_compressed_diff(pr_languages, token_handler)
    return "\n".join(patches_compressed)
 def pr_generate_extended_diff(pr_languages: list, token_handler: TokenHandler) -> \
        Tuple[list, int]:
    """
    Generate a standard diff string, with patch extension
    """
    total_tokens = token_handler.prompt_tokens  # initial tokens
    patches_extended = []
    for lang in pr_languages:
        for file in lang['files']:
            original_file_content_str = file.base_file
            new_file_content_str = file.head_file
            patch = file.patch
            # handle the case of large patch, that initially was not loaded
            patch = load_large_diff(file, new_file_content_str, original_file_content_str, patch)
            if not patch:
                continue
            # extend each patch with extra lines of context
            extended_patch = extend_patch(original_file_content_str, patch, num_lines=PATCH_EXTRA_LINES)
            full_extended_patch = f"## {file.filename}\n\n{extended_patch}\n"
            patch_tokens = token_handler.count_tokens(full_extended_patch)
            file.tokens = patch_tokens
            total_tokens += patch_tokens
            patches_extended.append(full_extended_patch)
    return patches_extended, total_tokens
 def pr_generate_compressed_diff(top_langs: list, token_handler: TokenHandler) -> list:
    # Apply Diff Minimization techniques to reduce the number of tokens:
    # 0. Start from the largest diff patch to smaller ones
    # 1. Don't use extend context lines around diff
    # 2. Minimize deleted files
    # 3. Minimize deleted hunks
    # 4. Minimize all remaining files when you reach token limit
    patches = []
    # sort each one of the languages in top_langs by the number of tokens in the diff
    sorted_files = []
    for lang in top_langs:
        sorted_files.extend(sorted(lang['files'], key=lambda x: x.tokens, reverse=True))
    total_tokens = token_handler.prompt_tokens
    for file in sorted_files:
        original_file_content_str = file.base_file
        new_file_content_str = file.head_file
        patch = file.patch
        patch = load_large_diff(file, new_file_content_str, original_file_content_str, patch)
        if not patch:
            continue
        # removing delete-only hunks
        patch = handle_patch_deletions(patch, original_file_content_str,
                                       new_file_content_str, file.filename)
        new_patch_tokens = token_handler.count_tokens(patch)
        if total_tokens > token_handler.limit - OUTPUT_BUFFER_TOKENS // 2:
            logging.warning(f"File was fully skipped, no more tokens: {file.filename}.")
            continue  # Hard Stop, no more tokens
        if total_tokens + new_patch_tokens > token_handler.limit - OUTPUT_BUFFER_TOKENS:
            # Current logic is to skip the patch if it's too large
            # TODO: Option for alternative logic to remove hunks from the patch to reduce the number of tokens
            #  until we meet the requirements
            if settings.config.verbosity_level >= 2:
                logging.warning(f"Patch too large, minimizing it, {file.filename}")
            patch = "File was modified"
        if patch:
            patch_final = f"## {file.filename}\n\n{patch}\n"
            patches.append(patch_final)
            total_tokens += token_handler.count_tokens(patch_final)
            if settings.config.verbosity_level >= 2:
                logging.info(f"Tokens: {total_tokens}, last filename: {file.filename}")
    return patches
 def load_large_diff(file, new_file_content_str: str, original_file_content_str: str, patch: str) -> str:
    if not patch:  # to Do - also add condition for file extension
        try:
            diff = difflib.unified_diff(original_file_content_str.splitlines(keepends=True),
                                        new_file_content_str.splitlines(keepends=True))
            if settings.config.verbosity_level >= 2:
                logging.warning(f"File was modified, but no patch was found. Manually creating patch: {file.filename}.")
            patch = ''.join(diff)
        except Exception:
            pass
    return patch
--- a/pr_agent/algo/token_handler.py
+++ b/pr_agent/algo/token_handler.py
@ -0,0 +1,24 @@
 from jinja2 import Environment, StrictUndefined
 from tiktoken import encoding_for_model
 from pr_agent.algo import MAX_TOKENS
 from pr_agent.config_loader import settings
 class TokenHandler:
    def __init__(self, pr, vars: dict, system, user):
        self.encoder = encoding_for_model(settings.config.model)
        self.limit = MAX_TOKENS[settings.config.model]
        self.prompt_tokens = self._get_system_user_tokens(pr, self.encoder, vars, system, user)
    def _get_system_user_tokens(self, pr, encoder, vars: dict, system, user):
        environment = Environment(undefined=StrictUndefined)
        system_prompt = environment.from_string(system).render(vars)
        user_prompt = environment.from_string(user).render(vars)
        system_prompt_tokens = len(encoder.encode(system_prompt))
        user_prompt_tokens = len(encoder.encode(user_prompt))
        return system_prompt_tokens + user_prompt_tokens
    def count_tokens(self, patch: str) -> int:
        return len(self.encoder.encode(patch))
--- a/pr_agent/algo/utils.py
+++ b/pr_agent/algo/utils.py
@ -0,0 +1,59 @@
 from __future__ import annotations
 import textwrap
 def convert_to_markdown(output_data: dict) -> str:
    markdown_text = ""
    emojis = {
        "Main theme": "🎯",
        "Description and title": "🔍",
        "Type of PR": "📌",
        "Relevant tests added": "🧪",
        "Unrelated changes": "⚠️",
        "Minimal and focused": "✨",
        "Security concerns": "🔒",
        "General PR suggestions": "💡",
        "Code suggestions": "🤖"
    }
    for key, value in output_data.items():
        if not value:
            continue
        if isinstance(value, dict):
            markdown_text += f"## {key}\n\n"
            markdown_text += convert_to_markdown(value)
        elif isinstance(value, list):
            if key.lower() == 'code suggestions':
                markdown_text += "\n"  # just looks nicer with additional line breaks
            emoji = emojis.get(key, "‣")  # Use a dash if no emoji is found for the key
            markdown_text += f"- {emoji} **{key}:**\n\n"
            for item in value:
                if isinstance(item, dict) and key.lower() == 'code suggestions':
                    markdown_text += parse_code_suggestion(item)
                elif item:
                    markdown_text += f"  - {item}\n"
        elif value != 'n/a':
            emoji = emojis.get(key, "‣")  # Use a dash if no emoji is found for the key
            markdown_text += f"- {emoji} **{key}:** {value}\n"
    return markdown_text
 def parse_code_suggestion(code_suggestions: dict) -> str:
    markdown_text = ""
    for sub_key, sub_value in code_suggestions.items():
        if isinstance(sub_value, dict):  # "code example"
            markdown_text += f"  - **{sub_key}:**\n"
            for code_key, code_value in sub_value.items():  # 'before' and 'after' code
                code_str = f"```\n{code_value}\n```"
                code_str_indented = textwrap.indent(code_str, '        ')
                markdown_text += f"    - **{code_key}:**\n{code_str_indented}\n"
        else:
            if "suggestion number" in sub_key.lower():
                markdown_text += f"- **suggestion {sub_value}:**\n"  # prettier formatting
            else:
                markdown_text += f"  - **{sub_key}:** {sub_value}\n"
    markdown_text += "\n"
    return markdown_text
--- a/pr_agent/config_loader.py
+++ b/pr_agent/config_loader.py
@ -0,0 +1,14 @@
 from os.path import abspath, dirname, join
 from dynaconf import Dynaconf
 current_dir = dirname(abspath(__file__))
 settings = Dynaconf(
    envvar_prefix=False,
    settings_files=[join(current_dir, f) for f in [
         "settings/.secrets.toml",
         "settings/configuration.toml",
         "settings/pr_reviewer_prompts.toml",
         "settings/pr_questions_prompts.toml"
        ]]
 )
--- a/pr_agent/git_providers/init.py
+++ b/pr_agent/git_providers/init.py
@ -0,0 +1,15 @@
 from pr_agent.config_loader import settings
 from pr_agent.git_providers.github_provider import GithubProvider
 _GIT_PROVIDERS = {
    'github': GithubProvider
 }
 def get_git_provider():
    try:
        provider_id = settings.config.git_provider
    except AttributeError as e:
        raise ValueError("github_provider is a required attribute in the configuration file") from e
    if provider_id not in _GIT_PROVIDERS:
        raise ValueError(f"Unknown git provider: {provider_id}")
    return _GIT_PROVIDERS[provider_id]
--- a/pr_agent/git_providers/github_provider.py
+++ b/pr_agent/git_providers/github_provider.py
@ -0,0 +1,170 @@
 from collections import namedtuple
 from dataclasses import dataclass
 from datetime import datetime
 from typing import Optional, Tuple
 from urllib.parse import urlparse
 from github import AppAuthentication, File, Github
 from pr_agent.config_loader import settings
@dataclass
 class FilePatchInfo:
    base_file: str
    head_file: str
    patch: str
    filename: str
    tokens: int = -1
 class GithubProvider:
    def __init__(self, pr_url: Optional[str] = None, installation_id: Optional[int] = None):
        self.installation_id = installation_id
        self.github_client = self._get_github_client()
        self.repo = None
        self.pr_num = None
        self.pr = None
        if pr_url:
            self.set_pr(pr_url)
    def set_pr(self, pr_url: str):
        self.repo, self.pr_num = self._parse_pr_url(pr_url)
        self.pr = self._get_pr()
    def get_diff_files(self) -> list[FilePatchInfo]:
        files = self.pr.get_files()
        diff_files = []
        for file in files:
            original_file_content_str = self._get_pr_file_content(file, self.pr.base.sha)
            new_file_content_str = self._get_pr_file_content(file, self.pr.head.sha)
            diff_files.append(FilePatchInfo(original_file_content_str, new_file_content_str, file.patch, file.filename))
        return diff_files
    def publish_comment(self, pr_comment: str):
        self.pr.create_issue_comment(pr_comment)
    def get_title(self):
        return self.pr.title
    def get_description(self):
        return self.pr.body
    def get_languages(self):
        return self._get_repo().get_languages()
    def get_main_pr_language(self) -> str:
        """
        Get the main language of the commit. Return an empty string if cannot determine.
        """
        main_language_str = ""
        try:
            languages = self.get_languages()
            top_language = max(languages, key=languages.get).lower()
            # validate that the specific commit uses the main language
            extension_list = []
            files = self.pr.get_files()
            for file in files:
                extension_list.append(file.filename.rsplit('.')[-1])
            # get the most common extension
            most_common_extension = max(set(extension_list), key=extension_list.count)
            # look for a match. TBD: add more languages, do this systematically
            if most_common_extension == 'py' and top_language == 'python' or \
                    most_common_extension == 'js' and top_language == 'javascript' or \
                    most_common_extension == 'ts' and top_language == 'typescript' or \
                    most_common_extension == 'go' and top_language == 'go' or \
                    most_common_extension == 'java' and top_language == 'java' or \
                    most_common_extension == 'c' and top_language == 'c' or \
                    most_common_extension == 'cpp' and top_language == 'c++' or \
                    most_common_extension == 'cs' and top_language == 'c#' or \
                    most_common_extension == 'swift' and top_language == 'swift' or \
                    most_common_extension == 'php' and top_language == 'php' or \
                    most_common_extension == 'rb' and top_language == 'ruby' or \
                    most_common_extension == 'rs' and top_language == 'rust' or \
                    most_common_extension == 'scala' and top_language == 'scala' or \
                    most_common_extension == 'kt' and top_language == 'kotlin' or \
                    most_common_extension == 'pl' and top_language == 'perl' or \
                    most_common_extension == 'swift' and top_language == 'swift':
                main_language_str = top_language
        except Exception:
            pass
        return main_language_str
    def get_pr_branch(self):
        return self.pr.head.ref
    def get_notifications(self, since: datetime):
        deployment_type = settings.get("GITHUB.DEPLOYMENT_TYPE", "user")
        if deployment_type != 'user':
            raise ValueError("Deployment mode must be set to 'user' to get notifications")
        notifications = self.github_client.get_user().get_notifications(since=since)
        return notifications
    @staticmethod
    def _parse_pr_url(pr_url: str) -> Tuple[str, int]:
        parsed_url = urlparse(pr_url)
        if 'github.com' not in parsed_url.netloc:
            raise ValueError("The provided URL is not a valid GitHub URL")
        path_parts = parsed_url.path.strip('/').split('/')
        if 'api.github.com' in parsed_url.netloc:
            if len(path_parts) < 5 or path_parts[3] != 'pulls':
                raise ValueError("The provided URL does not appear to be a GitHub PR URL")
            repo_name = '/'.join(path_parts[1:3])
            try:
                pr_number = int(path_parts[4])
            except ValueError as e:
                raise ValueError("Unable to convert PR number to integer") from e
            return repo_name, pr_number
        if len(path_parts) < 4 or path_parts[2] != 'pull':
            raise ValueError("The provided URL does not appear to be a GitHub PR URL")
        repo_name = '/'.join(path_parts[:2])
        try:
            pr_number = int(path_parts[3])
        except ValueError as e:
            raise ValueError("Unable to convert PR number to integer") from e
        return repo_name, pr_number
    def _get_github_client(self):
        deployment_type = settings.get("GITHUB.DEPLOYMENT_TYPE", "user")
        if deployment_type == 'app':
            try:
                private_key = settings.github.private_key
                app_id = settings.github.app_id
            except AttributeError as e:
                raise ValueError("GitHub app ID and private key are required when using GitHub app deployment") from e
            if not self.installation_id:
                raise ValueError("GitHub app installation ID is required when using GitHub app deployment")
            auth = AppAuthentication(app_id=app_id, private_key=private_key,
                                     installation_id=self.installation_id)
            return Github(app_auth=auth)
        if deployment_type == 'user':
            try:
                token = settings.github.user_token
            except AttributeError as e:
                raise ValueError("GitHub token is required when using user deployment") from e
            return Github(token)
    def _get_repo(self):
        return self.github_client.get_repo(self.repo)
    def _get_pr(self):
        return self._get_repo().get_pull(self.pr_num)
    def _get_pr_file_content(self, file: FilePatchInfo, sha: str):
        try:
            file_content_str = self._get_repo().get_contents(file.filename, ref=sha).decoded_content.decode()
        except Exception:
            file_content_str = ""
        return file_content_str
--- a/pr_agent/scripts/answer_pr_questions_from_url.py
+++ b/pr_agent/scripts/answer_pr_questions_from_url.py
@ -0,0 +1,16 @@
 import argparse
 import asyncio
 import logging
 import os
 from pr_agent.tools.pr_questions import PRQuestions
 if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Review a PR from a URL')
    parser.add_argument('--pr_url', type=str, help='The URL of the PR to review', required=True)
    parser.add_argument('--question_str', type=str, help='The question to answer', required=True)
    args = parser.parse_args()
    logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))
    reviewer = PRQuestions(args.pr_url, args.question_str, None)
    asyncio.run(reviewer.answer())
--- a/pr_agent/scripts/review_pr_from_url.py
+++ b/pr_agent/scripts/review_pr_from_url.py
@ -0,0 +1,14 @@
 import argparse
 import asyncio
 import logging
 import os
 from pr_agent.tools.pr_reviewer import PRReviewer
 if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Review a PR from a URL')
    parser.add_argument('--pr_url', type=str, help='The URL of the PR to review', required=True)
    args = parser.parse_args()
    logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))
    reviewer = PRReviewer(args.pr_url, None)
    asyncio.run(reviewer.review())
--- a/pr_agent/servers/github_app_webhook.py
+++ b/pr_agent/servers/github_app_webhook.py
@ -0,0 +1,78 @@
 import logging
 import sys
 import uvicorn
 from fastapi import APIRouter, FastAPI, HTTPException, Request, Response
 from pr_agent.agent.pr_agent import PRAgent
 from pr_agent.config_loader import settings
 from pr_agent.servers.utils import verify_signature
 logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
 router = APIRouter()
@router.post("/api/v1/github_webhooks")
 async def handle_github_webhooks(request: Request, response: Response):
    logging.debug("Received a github webhook")
    try:
        body = await request.json()
    except Exception as e:
        logging.error("Error parsing request body", e)
        raise HTTPException(status_code=400, detail="Error parsing request body") from e
    body_bytes = await request.body()
    signature_header = request.headers.get('x-hub-signature-256', None)
    try:
        webhook_secret = settings.github.webhook_secret
    except AttributeError:
        webhook_secret = None
    if webhook_secret:
        verify_signature(body_bytes, webhook_secret, signature_header)
    logging.debug(f'Request body:\n{body}')
    return await handle_request(body)
 async def handle_request(body):
    action = body.get("action", None)
    installation_id = body.get("installation", {}).get("id", None)
    agent = PRAgent(installation_id)
    if action == 'created':
        if "comment" not in body:
            return {}
        comment_body = body.get("comment", {}).get("body", None)
        if "says 'Please" in comment_body:
            return {}
        if "issue" not in body and "pull_request" not in body["issue"]:
            return {}
        pull_request = body["issue"]["pull_request"]
        api_url = pull_request.get("url", None)
        await agent.handle_request(api_url, comment_body)
    elif action in ["opened"] or 'reopened' in action:
        pull_request = body.get("pull_request", None)
        if not pull_request:
            return {}
        api_url = pull_request.get("url", None)
        if api_url is None:
            return {}
        await agent.handle_request(api_url, "please review")
    else:
        return {}
@router.get("/")
 async def root():
    return {"status": "ok"}
 def start():
    if settings.get("GITHUB.DEPLOYMENT_TYPE", "user") != "app":
        raise Exception("Please set deployment type to app in .secrets.toml file")
    app = FastAPI()
    app.include_router(router)
    uvicorn.run(app, host="0.0.0.0", port=3000)
 if __name__ == '__main__':
    start()
--- a/pr_agent/servers/github_polling.py
+++ b/pr_agent/servers/github_polling.py
@ -0,0 +1,73 @@
 import asyncio
 import logging
 import sys
 from datetime import datetime, timezone
 import aiohttp
 from pr_agent.agent.pr_agent import PRAgent
 from pr_agent.config_loader import settings
 logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
 NOTIFICATION_URL = "https://api.github.com/notifications"
 def now() -> str:
    now_utc = datetime.now(timezone.utc).isoformat()
    now_utc = now_utc.replace("+00:00", "Z")
    return now_utc
 async def polling_loop():
    since = [now()]
    last_modified = [None]
    try:
        deployment_type = settings.github.deployment_type
        token = settings.github.user_token
    except AttributeError:
        deployment_type = 'none'
        token = None
    if deployment_type != 'user':
        raise ValueError("Deployment mode must be set to 'user' to get notifications")
    if not token:
        raise ValueError("User token must be set to get notifications")
    async with aiohttp.ClientSession() as session:
        while True:
            headers = {
                "Accept": "application/vnd.github.v3+json",
                "Authorization": f"Bearer {token}"
            }
            params = {
                "participating": "true"
            }
            if since[0]:
                params["since"] = since[0]
            if last_modified[0]:
                headers["If-Modified-Since"] = last_modified[0]
            async with session.get(NOTIFICATION_URL, headers=headers, params=params) as response:
                if response.status == 200:
                    if 'Last-Modified' in response.headers:
                        last_modified[0] = response.headers['Last-Modified']
                        since[0] = None
                    notifications = await response.json()
                    for notification in notifications:
                        if 'reason' in notification and notification['reason'] == 'mention':
                            if 'subject' in notification and notification['subject']['type'] == 'PullRequest':
                                pr_url = notification['subject']['url']
                                latest_comment = notification['subject']['latest_comment_url']
                                async with session.get(latest_comment, headers=headers) as comment_response:
                                    if comment_response.status == 200:
                                        comment = await comment_response.json()
                                        comment_body = comment['body'] if 'body' in comment else ''
                                        commenter_github_user = comment['user']['login'] if 'user' in comment else ''
                                        logging.info(f"Commenter: {commenter_github_user}\nComment: {comment_body}")
                                        if comment_body.strip().startswith("@"):
                                            agent = PRAgent()
                                            await agent.handle_request(pr_url, comment_body)
                elif response.status != 304:
                    print(f"Failed to fetch notifications. Status code: {response.status}")
            await asyncio.sleep(5)
 if __name__ == '__main__':
    asyncio.run(polling_loop())
--- a/pr_agent/servers/utils.py
+++ b/pr_agent/servers/utils.py
@ -0,0 +1,23 @@
 import hashlib
 import hmac
 from fastapi import HTTPException
 def verify_signature(payload_body, secret_token, signature_header):
    """Verify that the payload was sent from GitHub by validating SHA256.
    Raise and return 403 if not authorized.
    Args:
        payload_body: original request body to verify (request.body())
        secret_token: GitHub app webhook token (WEBHOOK_SECRET)
        signature_header: header received from GitHub (x-hub-signature-256)
    """
    if not signature_header:
        raise HTTPException(status_code=403, detail="x-hub-signature-256 header is missing!")
    hash_object = hmac.new(secret_token.encode('utf-8'), msg=payload_body, digestmod=hashlib.sha256)
    expected_signature = "sha256=" + hash_object.hexdigest()
    if not hmac.compare_digest(expected_signature, signature_header):
        raise HTTPException(status_code=403, detail="Request signatures didn't match!")
--- a/pr_agent/settings/.secrets_template.toml
+++ b/pr_agent/settings/.secrets_template.toml
@ -0,0 +1,26 @@
 # QUICKSTART:
 # Copy this file to .secrets in the same folder.
 # The minimum workable settings - set openai.key to your API key.
 # Set github.deployment_type to "user" and github.user_token to your GitHub personal access token.
 # This will allow you to run the CLI scripts in the scripts/ folder and the github_polling server.
 #
 # See README for details about GitHub App deployment.
 [openai]
 key = "<API_KEY>"
 [github]
 # The type of deployment to create. Valid values are 'app' or 'user'.
 deployment_type = "user"
 # ---- Set the following only for deployment type == "user"
 user_token = "<TOKEN>"  # A GitHub personal access token with 'repo' scope.
 # ---- Set the following only for deployment type == "app", see README for details.
 private_key = """\
 -----BEGIN RSA PRIVATE KEY-----
 <GITHUB PRIVATE KEY>
 -----END RSA PRIVATE KEY-----
 """
 app_id = 123456  # The GitHub App ID, replace with your own.
 webhook_secret = "<WEBHOOK SECRET>"  # Optional, may be commented out.
--- a/pr_agent/settings/configuration.toml
+++ b/pr_agent/settings/configuration.toml
@ -0,0 +1,15 @@
 [config]
 model="gpt-4-0613"
 git_provider="github"
 publish_review=true
 verbosity_level=0  # 0,1,2
 [pr_reviewer]
 require_minimal_and_focused_review=true
 require_tests_review=true
 require_security_review=true
 extended_code_suggestions=false
 num_code_suggestions=4
 [pr_questions]
--- a/pr_agent/settings/pr_questions_prompts.toml
+++ b/pr_agent/settings/pr_questions_prompts.toml
@ -0,0 +1,30 @@
 [pr_questions_prompt]
 system="""You are CodiumAI-PR-Reviewer, a language model designed to review git pull requests.
 Your task is to answer questions about the new PR code (the '+' lines), and provide feedback.
 Be informative, constructive, and give examples. Try to be as specific as possible, and don't avoid answering the questions.
 Make sure not to repeat modifications already implemented in the new PR code (the '+' lines).
 """
 user="""PR Info:
 Title: '{{title}}'
 Branch: '{{branch}}'
 Description: '{{description}}'
 {%- if language %}
 Main language: {{language}}
 {%- endif %}
 The PR Git Diff:
 ```
 {{diff}}
 ```
 Note that lines in the diff body are prefixed with a symbol that represents the type of change: '-' for deletions, '+' for additions, and ' ' (a space) for unchanged lines
 The PR Questions:
 ```
 {{ questions }}
 ```
 Response:
 """
--- a/pr_agent/settings/pr_reviewer_prompts.toml
+++ b/pr_agent/settings/pr_reviewer_prompts.toml
@ -0,0 +1,159 @@
 [pr_review_prompt]
 system="""You are CodiumAI-PR-Reviewer, a language model designed to review git pull requests.
 Your task is to provide constructive and concise feedback for the PR, and also provide meaningfull code suggestions to improve the new PR code (the '+' lines).
 - Provide up to {{ num_code_suggestions }} code suggestions.
 - Try to focus on important suggestions like fixing code problems, issues and bugs. As a second priority, provide suggestions for meaningfull code improvements, like performance, vulnerability, modularity, and best practices.
 {%- if extended_code_suggestions %}
 - For each suggestion, provide a short and concise code snippet to illustrate the existing code, and the improved code.
 {%- endif %}
 - Make sure not to provide suggestion repeating modifications already implemented in the new PR code (the '+' lines).
 You must use the following JSON schema to format your answer:
 ```json
 {
  "PR Analysis": {
    "Main theme": {
      "type": "string",
      "description": "a short explanation of the PR"
    },
    "Description and title": {
      "type": "string",
      "description": "yes\\no question: does this PR have a relevant description and title"
    },
    "Type of PR": {
      "type": "string",
      "enum": ["Bug fix", "Tests", "Bug fix with tests", "Refactoring", "Enhancement", "Documentation", "Other"]
    },
 {%- if require_tests %}
    "Relevant tests added": {
      "type": "string",
      "description": "yes\\no question: does this PR have relevant tests ?"
    },
 {%- endif %}
 {%- if require_minimal_and_focused %}
    "Minimal and focused": {
      "type": "string",
      "description": "is this PR as minimal and focused as possible, with all code changes centered around a single coherent theme, described in the PR description and title ?" explain your answer"
    }
  },
 {%- endif %}
  "PR Feedback": {
    "General PR suggestions": {
      "type": "string",
      "description": "important suggestions for the contributors and maintainers of this PR, may include overall structure, primary purpose and best practices. consider using specific filenames, classes and functions names. explain yourself!"
    },
    "Code suggestions": {
      "type": "array",
      "maxItems": {{ num_code_suggestions }},
      "uniqueItems": true,
      "items": {
        "suggestion number": {
          "type": "int",
          "description": "suggestion number, starting from 1"
        },
        "relevant file": {
          "type": "string",
          "description": "the relevant file name"
        },
        "suggestion content": {
          "type": "string",
 {%- if extended_code_suggestions %}
          "description": "a concrete suggestion for meaningfully improving the new PR code. Don't repeat previous suggestions. Add tags with importance measure that matches each suggestion ('important' or 'medium'). Do not make suggestions for updating or adding docstrings, renaming PR title and description, or linter like.
 {%- else %}
          "description": "a concrete suggestion for meaningfully improving the new PR code. Also describe how, specifically, the suggestion can be applied to new PR code. Add tags with importance measure that matches each suggestion ('important' or 'medium'). Do not make suggestions for updating or adding docstrings, renaming PR title and description, or linter like.
 {%- endif %}
        },
 {%- if extended_code_suggestions %}
        "why": {
          "type": "string",
          "description": "shortly explain why this suggestion is important"
        },
        "code example": {
          "type": "object",
          "properties": {
            "before code": {
              "type": "string",
              "description": "Short and concise code snippet, to illustrate the existing code"
            },
            "after code": {
              "type": "string",
              "description": "Short and concise code snippet, to illustrate the improved code"
            }
          }
        }
 {%- endif %}
      }
    },
 {%- if require_security %}
    "Security concerns": {
      "type": "string",
      "description": "yes\\no question: does this PR code introduce possible security concerns or issues, like SQL injection, XSS, CSRF, and others ? explain your answer"
       ? explain your answer"
    }
 {%- endif %}
  }
 }
 ```
 Example output:
 '
 {
    "PR Analysis":
    {
        "Main theme": "xxx",
        "Description and title": "Yes",
        "Type of PR": "Bug fix",
 {%- if require_tests %}
        "Relevant tests added": "No",
 {%- endif %}
 {%- if require_minimal_and_focused %}
        "Minimal and focused": "No, because ..."
 {%- endif %}
    },
    "PR Feedback":
    {
        "General PR suggestions": "..., `xxx`...",
        "Code suggestions": [
            {
                "suggestion number": 1,
                "relevant file": "xxx.py",
                "suggestion content": "xxx [important]",
 {%- if extended_code_suggestions %}
                "why": "xxx",
                "code example":
                {
                    "before code": "xxx",
                    "after code": "xxx"
                }
 {%- endif %}
            },
            ...
        ]
 {%- if require_security %},
       "Security concerns": "No, because ..."
 {%- endif %}
    }
 }
 '
 Don't repeat the prompt in the answer, and avoid outputting the 'type' and 'description' fields.
 """
 user="""PR Info:
 Title: '{{title}}'
 Branch: '{{branch}}'
 Description: '{{description}}'
 {%- if language %}
 Main language: {{language}}
 {%- endif %}
 The PR Git Diff:
 ```
 {{diff}}
 ```
 Note that lines in the diff body are prefixed with a symbol that represents the type of change: '-' for deletions, '+' for additions, and ' ' (a space) for unchanged lines.
 Response (should be a valid JSON, and nothing else):
 ```json
 """
--- a/pr_agent/tools/init.py
+++ b/pr_agent/tools/init.py
--- a/pr_agent/tools/pr_questions.py
+++ b/pr_agent/tools/pr_questions.py
@ -0,0 +1,67 @@
 import copy
 import logging
 from typing import Optional
 from jinja2 import Environment, StrictUndefined
 from pr_agent.algo.ai_handler import AiHandler
 from pr_agent.algo.pr_processing import get_pr_diff
 from pr_agent.algo.token_handler import TokenHandler
 from pr_agent.config_loader import settings
 from pr_agent.git_providers import get_git_provider
 class PRQuestions:
    def __init__(self, pr_url: str, question_str: str, installation_id: Optional[int] = None):
        self.git_provider = get_git_provider()(pr_url, installation_id)
        self.main_pr_language = self.git_provider.get_main_pr_language()
        self.installation_id = installation_id
        self.ai_handler = AiHandler()
        self.question_str = question_str
        self.vars = {
            "title": self.git_provider.pr.title,
            "branch": self.git_provider.get_pr_branch(),
            "description": self.git_provider.pr.body,
            "language": self.git_provider.get_main_pr_language(),
            "diff": "", # empty diff for initial calculation
            "questions": self.question_str,
        }
        self.token_handler = TokenHandler(self.git_provider.pr,
                                          self.vars,
                                          settings.pr_questions_prompt.system,
                                          settings.pr_questions_prompt.user)
        self.patches_diff = None
        self.prediction = None
    async def answer(self):
        logging.info('Answering a PR question...')
        self.git_provider.publish_comment("Preparing answer...")
        logging.info('Getting PR diff...')
        self.patches_diff = get_pr_diff(self.git_provider, self.token_handler)
        logging.info('Getting AI prediction...')
        self.prediction = await self._get_prediction()
        logging.info('Preparing answer...')
        pr_comment = self._prepare_pr_answer()
        if settings.config.publish_review:
            logging.info('Pushing answer...')
            self.git_provider.publish_comment(pr_comment)
        return ""
    async def _get_prediction(self):
        variables = copy.deepcopy(self.vars)
        variables["diff"] = self.patches_diff  # update diff
        environment = Environment(undefined=StrictUndefined)
        system_prompt = environment.from_string(settings.pr_questions_prompt.system).render(variables)
        user_prompt = environment.from_string(settings.pr_questions_prompt.user).render(variables)
        if settings.config.verbosity_level >= 2:
            logging.info(f"\nSystem prompt:\n{system_prompt}")
            logging.info(f"\nUser prompt:\n{user_prompt}")
        model = settings.config.model
        response, finish_reason = await self.ai_handler.chat_completion(model=model, temperature=0.2,
                                                                        system=system_prompt, user=user_prompt)
        return response
    def _prepare_pr_answer(self) -> str:
        answer_str = f"Questions: {self.question_str}\n\n"
        answer_str += f"Answer: {self.prediction.strip()}\n\n"
        return answer_str
--- a/pr_agent/tools/pr_reviewer.py
+++ b/pr_agent/tools/pr_reviewer.py
@ -0,0 +1,88 @@
 import copy
 import json
 import logging
 from typing import Optional
 from jinja2 import Environment, StrictUndefined
 from pr_agent.algo.ai_handler import AiHandler
 from pr_agent.algo.pr_processing import get_pr_diff
 from pr_agent.algo.token_handler import TokenHandler
 from pr_agent.algo.utils import convert_to_markdown
 from pr_agent.config_loader import settings
 from pr_agent.git_providers import get_git_provider
 class PRReviewer:
    def __init__(self, pr_url: str, installation_id: Optional[int] = None):
        self.git_provider = get_git_provider()(pr_url, installation_id)
        self.main_language = self.git_provider.get_main_pr_language()
        self.installation_id = installation_id
        self.ai_handler = AiHandler()
        self.patches_diff = None
        self.prediction = None
        self.vars = {
            "title": self.git_provider.pr.title,
            "branch": self.git_provider.get_pr_branch(),
            "description": self.git_provider.pr.body,
            "language": self.git_provider.get_main_pr_language(),
            "diff": "",  # empty diff for initial calculation
            "require_tests": settings.pr_reviewer.require_tests_review,
            "require_security": settings.pr_reviewer.require_security_review,
            "require_minimal_and_focused": settings.pr_reviewer.require_minimal_and_focused_review,
            'extended_code_suggestions': settings.pr_reviewer.extended_code_suggestions,
            'num_code_suggestions': settings.pr_reviewer.num_code_suggestions,
        }
        self.token_handler = TokenHandler(self.git_provider.pr,
                                          self.vars,
                                          settings.pr_review_prompt.system,
                                          settings.pr_review_prompt.user)
    async def review(self):
        logging.info('Reviewing PR...')
        if settings.config.publish_review:
            self.git_provider.publish_comment("Preparing review...")
        logging.info('Getting PR diff...')
        self.patches_diff = get_pr_diff(self.git_provider, self.token_handler)
        logging.info('Getting AI prediction...')
        self.prediction = await self._get_prediction()
        logging.info('Preparing PR review...')
        pr_comment = self._prepare_pr_review()
        if settings.config.publish_review:
            logging.info('Pushing PR review...')
            self.git_provider.publish_comment(pr_comment)
        return ""
    async def _get_prediction(self):
        variables = copy.deepcopy(self.vars)
        variables["diff"] = self.patches_diff  # update diff
        environment = Environment(undefined=StrictUndefined)
        system_prompt = environment.from_string(settings.pr_review_prompt.system).render(variables)
        user_prompt = environment.from_string(settings.pr_review_prompt.user).render(variables)
        if settings.config.verbosity_level >= 2:
            logging.info(f"\nSystem prompt:\n{system_prompt}")
            logging.info(f"\nUser prompt:\n{user_prompt}")
        model = settings.config.model
        response, finish_reason = await self.ai_handler.chat_completion(model=model, temperature=0.2,
                                                                        system=system_prompt, user=user_prompt)
        try:
            json.loads(response)
        except json.decoder.JSONDecodeError:
            logging.warning("Could not decode JSON")
            response = {}
        return response
    def _prepare_pr_review(self) -> str:
        review = self.prediction.strip()
        try:
            data = json.loads(review)
        except json.decoder.JSONDecodeError:
            logging.error("Unable to decode JSON response from AI")
            data = {}
        markdown_text = convert_to_markdown(data)
        markdown_text += "\nAdd a comment that says 'Please review' to ask for a new review after you update the PR.\n"
        markdown_text += "Add a comment that says 'Please answer <QUESTION...>' to ask a question about this PR.\n"
        if settings.config.verbosity_level >= 2:
            logging.info(f"Markdown response:\n{markdown_text}")
        return markdown_text
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,32 @@
 [tool.ruff]
 line-length = 120
 select = [
  "E",  # Pyflakes
  "F",  # Pyflakes
  "B",  # flake8-bugbear
  "I001",  # isort basic checks
  "I002",  # isort missing-required-import
  ]
 # First commit - only fixing isort
 fixable = [
  "I001",  # isort basic checks
 ]
 unfixable = [
  "B",  # Avoid trying to fix flake8-bugbear (`B`) violations.
  ]
 exclude = [
  "api/code_completions",
 ]
 ignore = [
  "E999", "B008"
 ]
 [tool.ruff.per-file-ignores]
 "__init__.py" = ["E402"]  # Ignore `E402` (import violations) in all `__init__.py` files, and in `path/to/file.py`.
 # TODO: should decide if maybe not to ignore these.
--- a/requirements-dev.txt
+++ b/requirements-dev.txt
@ -0,0 +1 @@
 pytest==7.4.0
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,8 @@
 dynaconf==3.1.12
 fastapi==0.99.0
 PyGithub==1.58.2
 retry==0.9.2
 openai==0.27.8
 Jinja2==3.1.2
 tiktoken==0.4.0
 uvicorn==0.22.0
--- a/tests/unit/test_convert_to_markdown.py
+++ b/tests/unit/test_convert_to_markdown.py
@ -0,0 +1,124 @@
 # Generated by CodiumAI
 from pr_agent.algo.utils import convert_to_markdown
 """
 Code Analysis
 Objective:
 The objective of the 'convert_to_markdown' function is to convert a dictionary of data into a markdown-formatted text. 
 The function takes in a dictionary as input and recursively iterates through its keys and values to generate the 
 markdown text.
 Inputs:
 - A dictionary of data containing information about a pull request.
 Flow:
 - Initialize an empty string variable 'markdown_text'.
 - Create a dictionary 'emojis' containing emojis for each key in the input dictionary.
 - Iterate through the input dictionary:
  - If the value is empty, continue to the next iteration.
  - If the value is a dictionary, recursively call the 'convert_to_markdown' function with the value as input and 
  append the returned markdown text to 'markdown_text'.
  - If the value is a list:
    - If the key is 'code suggestions', add an additional line break to 'markdown_text'.
    - Get the corresponding emoji for the key from the 'emojis' dictionary. If no emoji is found, use a dash.
    - Append the emoji and key to 'markdown_text'.
    - Iterate through the items in the list:
      - If the item is a dictionary and the key is 'code suggestions', call the 'parse_code_suggestion' function with 
      the item as input and append the returned markdown text to 'markdown_text'.
      - If the item is not empty, append it to 'markdown_text'.
  - If the value is not 'n/a', get the corresponding emoji for the key from the 'emojis' dictionary. If no emoji is 
  found, use a dash. Append the emoji, key, and value to 'markdown_text'.
 - Return 'markdown_text'.
 Outputs:
 - A markdown-formatted string containing the information from the input dictionary.
 Additional aspects:
 - The function uses recursion to handle nested dictionaries.
 - The 'parse_code_suggestion' function is called for items in the 'code suggestions' list.
 - The function uses emojis to add visual cues to the markdown text.
 """
 class TestConvertToMarkdown:
    # Tests that the function works correctly with a simple dictionary input
    def test_simple_dictionary_input(self):
        input_data = {
            'Main theme': 'Test',
            'Description and title': 'Test description',
            'Type of PR': 'Test type',
            'Relevant tests added': 'no',
            'Unrelated changes': 'n/a',  # won't be included in the output
            'Minimal and focused': 'Yes',
            'General PR suggestions': 'general suggestion...',
            'Code suggestions': [
                {
                    'Suggestion number': 1,
                    'Code example': {
                        'Before': 'Code before',
                        'After': 'Code after'
                    }
                },
                {
                    'Suggestion number': 2,
                    'Code example': {
                        'Before': 'Code before 2',
                        'After': 'Code after 2'
                    }
                }
            ]
        }
        expected_output = """\
 - 🎯 **Main theme:** Test
 - 🔍 **Description and title:** Test description
 - 📌 **Type of PR:** Test type
 - 🧪 **Relevant tests added:** no
 - ✨ **Minimal and focused:** Yes
 - 💡 **General PR suggestions:** general suggestion...
 - 🤖 **Code suggestions:**
 - **suggestion 1:**
  - **Code example:**
    - **Before:**
        ```
        Code before
        ```
    - **After:**
        ```
        Code after
        ```
 - **suggestion 2:**
  - **Code example:**
    - **Before:**
        ```
        Code before 2
        ```
    - **After:**
        ```
        Code after 2
        ```
 """
        assert convert_to_markdown(input_data).strip() == expected_output.strip()
    # Tests that the function works correctly with an empty dictionary input
    def test_empty_dictionary_input(self):
        input_data = {}
        expected_output = ""
        assert convert_to_markdown(input_data).strip() == expected_output.strip()
    def test_dictionary_input_containing_only_empty_dictionaries(self):
        input_data = {
            'Main theme': {},
            'Description and title': {},
            'Type of PR': {},
            'Relevant tests added': {},
            'Unrelated changes': {},
            'Minimal and focused': {},
            'General PR suggestions': {},
            'Code suggestions': {}
        }
        expected_output = ""
        assert convert_to_markdown(input_data).strip() == expected_output.strip()
--- a/tests/unit/test_delete_hunks.py
+++ b/tests/unit/test_delete_hunks.py
@ -0,0 +1,84 @@
 # Generated by CodiumAI
 from pr_agent.algo.git_patch_processing import omit_deletion_hunks
 """
 Code Analysis
 Objective:
 The objective of the "omit_deletion_hunks" function is to remove deletion hunks from a patch file and return only the 
 added lines.
 Inputs:
 - "patch_lines": a list of strings representing the lines of a patch file.
 Flow:
 - Initialize empty lists "temp_hunk" and "added_patched", and boolean variables "add_hunk" and "inside_hunk".
 - Compile a regular expression pattern to match hunk headers.
 - Iterate through each line in "patch_lines".
 - If the line starts with "@@", match the line with the hunk header pattern, finish the previous hunk if necessary, 
 and append the line to "temp_hunk".
 - If the line does not start with "@@", append the line to "temp_hunk", check if it is an added line, and set 
 "add_hunk" to True if it is.
 - If the function reaches the end of "patch_lines" and there is an unfinished hunk with added lines, append it to 
 "added_patched".
 - Join the lines in "added_patched" with newline characters and return the resulting string.
 Outputs:
 - A string representing the added lines in the patch file.
 Additional aspects:
 - The function only considers hunks with added lines and ignores hunks with deleted lines.
 - The function assumes that the input patch file is well-formed and follows the unified diff format.
 """
 class TestOmitDeletionHunks:
    # Tests that the function correctly handles a simple patch containing only additions
    def test_simple_patch_additions(self):
        patch_lines = ['@@ -1,0 +1,1 @@\n', '+added line\n']
        expected_output = '@@ -1,0 +1,1 @@\n\n+added line\n'
        assert omit_deletion_hunks(patch_lines) == expected_output
    # Tests that the function correctly omits deletion hunks and concatenates multiple hunks in a patch.
    def test_patch_multiple_hunks(self):
        patch_lines = ['@@ -1,0 +1,1 @@\n', '-deleted line', '+added line\n', '@@ -2,0 +3,1 @@\n', '-deleted line\n',
                       '-another deleted line\n']
        expected_output = '@@ -1,0 +1,1 @@\n\n-deleted line\n+added line\n'
        assert omit_deletion_hunks(patch_lines) == expected_output
    # Tests that the function correctly omits deletion lines from the patch when there are no additions or context
    # lines.
    def test_patch_only_deletions(self):
        patch_lines = ['@@ -1,1 +1,0 @@\n', '-deleted line\n']
        expected_output = ''
        assert omit_deletion_hunks(patch_lines) == expected_output
        # Additional deletion lines
        patch_lines = ['@@ -1,1 +1,0 @@\n', '-deleted line\n', '-another deleted line\n']
        expected_output = ''
        assert omit_deletion_hunks(patch_lines) == expected_output
        # Additional context lines
        patch_lines = ['@@ -1,1 +1,0 @@\n', '-deleted line\n', '-another deleted line\n', 'context line 1\n',
                       'context line 2\n', 'context line 3\n']
        expected_output = ''
        assert omit_deletion_hunks(patch_lines) == expected_output
    # Tests that the function correctly handles an empty patch
    def test_empty_patch(self):
        patch_lines = []
        expected_output = ''
        assert omit_deletion_hunks(patch_lines) == expected_output
    # Tests that the function correctly handles a patch containing only one hunk
    def test_patch_one_hunk(self):
        patch_lines = ['@@ -1,0 +1,1 @@\n', '+added line\n']
        expected_output = '@@ -1,0 +1,1 @@\n\n+added line\n'
        assert omit_deletion_hunks(patch_lines) == expected_output
    # Tests that the function correctly handles a patch containing only deletions and no additions
    def test_patch_deletions_no_additions(self):
        patch_lines = ['@@ -1,1 +1,0 @@\n', '-deleted line\n']
        expected_output = ''
        assert omit_deletion_hunks(patch_lines) == expected_output
--- a/tests/unit/test_extend_patch.py
+++ b/tests/unit/test_extend_patch.py
@ -0,0 +1,93 @@
 # Generated by CodiumAI
 from pr_agent.algo.git_patch_processing import extend_patch
 """
 Code Analysis
 Objective:
 The objective of the 'extend_patch' function is to extend a given patch to include a specified number of surrounding 
 lines. This function takes in an original file string, a patch string, and the number of lines to extend the patch by, 
 and returns the extended patch string.
 Inputs:
 - original_file_str: a string representing the original file
 - patch_str: a string representing the patch to be extended
 - num_lines: an integer representing the number of lines to extend the patch by
 Flow:
 1. Split the original file string and patch string into separate lines
 2. Initialize variables to keep track of the current hunk's start and size for both the original file and the patch
 3. Iterate through each line in the patch string
 4. If the line starts with '@@', extract the start and size values for both the original file and the patch, and 
 calculate the extended start and size values
 5. Append the extended hunk header to the extended patch lines list
 6. Append the specified number of lines before the hunk to the extended patch lines list
 7. Append the current line to the extended patch lines list
 8. If the line is not a hunk header, append it to the extended patch lines list
 9. Return the extended patch string
 Outputs:
 - extended_patch_str: a string representing the extended patch
 Additional aspects:
 - The function uses regular expressions to extract the start and size values from the hunk header
 - The function handles cases where the start value of a hunk is less than the number of lines to extend by by setting 
 the extended start value to 1
 - The function handles cases where the hunk extends beyond the end of the original file by only including lines up to 
 the end of the original file in the extended patch
 """
 class TestExtendPatch:
    # Tests that the function works correctly with valid input
    def test_happy_path(self):
        original_file_str = 'line1\nline2\nline3\nline4\nline5'
        patch_str = '@@ -2,2 +2,2 @@ init()\n-line2\n+new_line2\nline3'
        num_lines = 1
        expected_output = '@@ -1,4 +1,4 @@ init()\nline1\n-line2\n+new_line2\nline3\nline4'
        actual_output = extend_patch(original_file_str, patch_str, num_lines)
        assert actual_output == expected_output
    # Tests that the function returns an empty string when patch_str is empty
    def test_empty_patch(self):
        original_file_str = 'line1\nline2\nline3\nline4\nline5'
        patch_str = ''
        num_lines = 1
        expected_output = ''
        assert extend_patch(original_file_str, patch_str, num_lines) == expected_output
    # Tests that the function returns the original patch when num_lines is 0
    def test_zero_num_lines(self):
        original_file_str = 'line1\nline2\nline3\nline4\nline5'
        patch_str = '@@ -2,2 +2,2 @@ init()\n-line2\n+new_line2\nline3'
        num_lines = 0
        assert extend_patch(original_file_str, patch_str, num_lines) == patch_str
    # Tests that the function returns the original patch when patch_str contains no hunks
    def test_no_hunks(self):
        original_file_str = 'line1\nline2\nline3\nline4\nline5'
        patch_str = 'no hunks here'
        num_lines = 1
        expected_output = 'no hunks here'
        assert extend_patch(original_file_str, patch_str, num_lines) == expected_output
    # Tests that the function extends a patch with a single hunk correctly
    def test_single_hunk(self):
        original_file_str = 'line1\nline2\nline3\nline4\nline5'
        patch_str = '@@ -2,3 +2,3 @@ init()\n-line2\n+new_line2\nline3\nline4'
        num_lines = 1
        expected_output = '@@ -1,5 +1,5 @@ init()\nline1\n-line2\n+new_line2\nline3\nline4\nline5'
        actual_output = extend_patch(original_file_str, patch_str, num_lines)
        assert actual_output == expected_output
    # Tests the functionality of extending a patch with multiple hunks.
    def test_multiple_hunks(self):
        original_file_str = 'line1\nline2\nline3\nline4\nline5\nline6'
        patch_str = '@@ -2,3 +2,3 @@ init()\n-line2\n+new_line2\nline3\nline4\n@@ -4,1 +4,1 @@ init2()\n-line4\n+new_line4'  # noqa: E501
        num_lines = 1
        expected_output = '@@ -1,5 +1,5 @@ init()\nline1\n-line2\n+new_line2\nline3\nline4\nline5\n@@ -3,3 +3,3 @@ init2()\nline3\n-line4\n+new_line4\nline5'  # noqa: E501
        actual_output = extend_patch(original_file_str, patch_str, num_lines)
        assert actual_output == expected_output
--- a/tests/unit/test_handle_patch_deletions.py
+++ b/tests/unit/test_handle_patch_deletions.py
@ -0,0 +1,84 @@
 # Generated by CodiumAI
 import logging
 from pr_agent.algo.git_patch_processing import handle_patch_deletions
 from pr_agent.config_loader import settings
 """
 Code Analysis
 Objective:
 The objective of the function is to handle entire file or deletion patches and return the patch after omitting the 
 deletion hunks.
 Inputs:
 - patch: a string representing the patch to be handled
 - original_file_content_str: a string representing the original content of the file
 - new_file_content_str: a string representing the new content of the file
 - file_name: a string representing the name of the file
 Flow:
 - If new_file_content_str is empty, set patch to "File was deleted" and return it
 - Otherwise, split patch into lines and omit the deletion hunks using the omit_deletion_hunks function
 - If the resulting patch is different from the original patch, log a message and set patch to the new patch
 - Return the resulting patch
 Outputs:
 - A string representing the patch after omitting the deletion hunks
 Additional aspects:
 - The function uses the settings from the configuration files to determine the verbosity level of the logging messages
 - The omit_deletion_hunks function is called to remove the deletion hunks from the patch
 - The function handles the case where the new_file_content_str is empty by setting the patch to "File was deleted"
 """
 class TestHandlePatchDeletions:
    # Tests that handle_patch_deletions returns the original patch when new_file_content_str is not empty
    def test_handle_patch_deletions_happy_path_new_file_content_exists(self):
        patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n+baz\n'
        original_file_content_str = 'foo\nbar\n'
        new_file_content_str = 'foo\nbaz\n'
        file_name = 'file.py'
        assert handle_patch_deletions(patch, original_file_content_str, new_file_content_str,
                                      file_name) == patch.rstrip()
    # Tests that handle_patch_deletions logs a message when verbosity_level is greater than 0
    def test_handle_patch_deletions_happy_path_verbosity_level_greater_than_0(self, caplog):
        patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n+baz\n'
        original_file_content_str = 'foo\nbar\n'
        new_file_content_str = ''
        file_name = 'file.py'
        settings.config.verbosity_level = 1
        with caplog.at_level(logging.INFO):
            handle_patch_deletions(patch, original_file_content_str, new_file_content_str, file_name)
            assert any("Processing file" in message for message in caplog.messages)
    # Tests that handle_patch_deletions returns 'File was deleted' when new_file_content_str is empty
    def test_handle_patch_deletions_edge_case_new_file_content_empty(self):
        patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n'
        original_file_content_str = 'foo\nbar\n'
        new_file_content_str = ''
        file_name = 'file.py'
        assert handle_patch_deletions(patch, original_file_content_str, new_file_content_str,
                                      file_name) == 'File was deleted\n'
    # Tests that handle_patch_deletions returns the original patch when patch and patch_new are equal
    def test_handle_patch_deletions_edge_case_patch_and_patch_new_are_equal(self):
        patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n'
        original_file_content_str = 'foo\nbar\n'
        new_file_content_str = 'foo\nbar\n'
        file_name = 'file.py'
        assert handle_patch_deletions(patch, original_file_content_str, new_file_content_str,
                                      file_name).rstrip() == patch.rstrip()
    # Tests that handle_patch_deletions returns the modified patch when patch and patch_new are not equal
    def test_handle_patch_deletions_edge_case_patch_and_patch_new_are_not_equal(self):
        patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n'
        original_file_content_str = 'foo\nbar\n'
        new_file_content_str = 'foo\nbaz\n'
        file_name = 'file.py'
        expected_patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar'
        assert handle_patch_deletions(patch, original_file_content_str, new_file_content_str,
                                      file_name) == expected_patch
--- a/tests/unit/test_language_handler
+++ b/tests/unit/test_language_handler
@ -0,0 +1,121 @@
 # Generated by CodiumAI
 from pr_agent.algo.language_handler import sort_files_by_main_languages
 import pytest
 """
 Code Analysis
 Objective:
 The objective of the function is to sort a list of files by their main language, putting the files that are in the main language first and the rest of the files after. It takes in a dictionary of languages and their sizes, and a list of files.
 Inputs:
 - languages: a dictionary containing the languages and their sizes
 - files: a list of files
 Flow:
 1. Sort the languages by their size in descending order
 2. Get all extensions for the languages
 3. Filter out files with bad extensions
 4. Sort files by their extension, putting the files that are in the main extension first and the rest of the files after
 5. Map languages_sorted to their respective files
 6. Append the files to the files_sorted list
 7. Append the rest of the files to the files_sorted list under the "Other" language category
 8. Return the files_sorted list
 Outputs:
 - files_sorted: a list of dictionaries containing the language and its respective files
 Additional aspects:
 - The function uses a language_extension_map dictionary to map the languages to their respective extensions
 - The function uses the filter_bad_extensions function to filter out files with bad extensions
 - The function uses a rest_files dictionary to store the files that do not belong to any of the main extensions
 """
 class TestSortFilesByMainLanguages:
    # Tests that files are sorted by main language, with files in main language first and the rest after
    def test_happy_path_sort_files_by_main_languages(self):
        languages = {'Python': 10, 'Java': 5, 'C++': 3}
        files = [
            type('', (object,), {'filename': 'file1.py'})(),
            type('', (object,), {'filename': 'file2.java'})(),
            type('', (object,), {'filename': 'file3.cpp'})(),
            type('', (object,), {'filename': 'file4.py'})(),
            type('', (object,), {'filename': 'file5.py'})()
        ]
        expected_output = [
            {'language': 'Python', 'files': [files[0], files[3], files[4]]},
            {'language': 'Java', 'files': [files[1]]},
            {'language': 'C++', 'files': [files[2]]},
            {'language': 'Other', 'files': []}
        ]
        assert sort_files_by_main_languages(languages, files) == expected_output
    # Tests that function handles empty languages dictionary
    def test_edge_case_empty_languages(self):
        languages = {}
        files = [
            type('', (object,), {'filename': 'file1.py'})(),
            type('', (object,), {'filename': 'file2.java'})()
        ]
        expected_output = [{'language': 'Other', 'files': []}]
        assert sort_files_by_main_languages(languages, files) == expected_output
    # Tests that function handles empty files list
    def test_edge_case_empty_files(self):
        languages = {'Python': 10, 'Java': 5}
        files = []
        expected_output = [
            {'language': 'Other', 'files': []}
        ]
        assert sort_files_by_main_languages(languages, files) == expected_output
    # Tests that function handles languages with no extensions
    def test_edge_case_languages_with_no_extensions(self):
        languages = {'Python': 10, 'Java': 5, 'C++': 3}
        files = [
            type('', (object,), {'filename': 'file1.py'})(),
            type('', (object,), {'filename': 'file2.java'})(),
            type('', (object,), {'filename': 'file3.cpp'})()
        ]
        expected_output = [
            {'language': 'Python', 'files': [files[0]]},
            {'language': 'Java', 'files': [files[1]]},
            {'language': 'C++', 'files': [files[2]]},
            {'language': 'Other', 'files': []}
        ]
        assert sort_files_by_main_languages(languages, files) == expected_output
    # Tests the behavior of the function when all files have bad extensions and only one new valid file is added.
    def test_edge_case_files_with_bad_extensions_only(self):
        languages = {'Python': 10, 'Java': 5, 'C++': 3}
        files = [
            type('', (object,), {'filename': 'file1.csv'})(),
            type('', (object,), {'filename': 'file2.pdf'})(),
            type('', (object,), {'filename': 'file3.py'})()  # new valid file
        ]
        expected_output = [{'language': 'Python', 'files': [files[2]]}, {'language': 'Other', 'files': []}]
        assert sort_files_by_main_languages(languages, files) == expected_output
    # Tests general behaviour of function
    def test_general_behaviour_sort_files_by_main_languages(self):
        languages = {'Python': 10, 'Java': 5, 'C++': 3}
        files = [
            type('', (object,), {'filename': 'file1.py'})(),
            type('', (object,), {'filename': 'file2.java'})(),
            type('', (object,), {'filename': 'file3.cpp'})(),
            type('', (object,), {'filename': 'file4.py'})(),
            type('', (object,), {'filename': 'file5.py'})(),
            type('', (object,), {'filename': 'file6.py'})(),
            type('', (object,), {'filename': 'file7.java'})(),
            type('', (object,), {'filename': 'file8.cpp'})(),
            type('', (object,), {'filename': 'file9.py'})()
        ]
        expected_output = [
            {'language': 'Python', 'files': [files[0], files[3], files[4], files[5], files[8]]},
            {'language': 'Java', 'files': [files[1], files[6]]},
            {'language': 'C++', 'files': [files[2], files[7]]},
            {'language': 'Other', 'files': []}
        ]
        assert sort_files_by_main_languages(languages, files) == expected_output
--- a/tests/unit/test_parse_code_suggestion.py
+++ b/tests/unit/test_parse_code_suggestion.py
@ -0,0 +1,88 @@
 # Generated by CodiumAI
 from pr_agent.algo.utils import parse_code_suggestion
 """
 Code Analysis
 Objective:
 The objective of the function is to convert a dictionary into a markdown format. The function takes in a dictionary as 
 input and recursively converts it into a markdown format. The function is specifically designed to handle dictionaries 
 that contain code suggestions.
 Inputs:
 - output_data: a dictionary containing the data to be converted into markdown format
 Flow:
 - Initialize an empty string variable called markdown_text
 - Create a dictionary of emojis to be used in the markdown format
 - Iterate through the items in the input dictionary
 - If the value is empty, skip to the next item
 - If the value is a dictionary, recursively call the function with the value as input
 - If the value is a list, iterate through the list and add each item to the markdown format
 - If the value is not 'n/a', add it to the markdown format
 - If the key is 'code suggestions', call the parse_code_suggestion function to handle the list of code suggestions
 - Return the markdown format as a string
 Outputs:
 - markdown_text: a string containing the input dictionary converted into markdown format
 Additional aspects:
 - The function uses the textwrap module to indent code examples in the markdown format
 - The parse_code_suggestion function is called to handle the 'code suggestions' key in the input dictionary
 - The function uses emojis to add visual cues to the markdown format
 """
 class TestParseCodeSuggestion:
    # Tests that function returns empty string when input is an empty dictionary
    def test_empty_dict(self):
        input_data = {}
        expected_output = "\n"  # modified to expect a newline character
        assert parse_code_suggestion(input_data) == expected_output
    # Tests that function returns correct output when 'suggestion number' key has a non-integer value
    def test_non_integer_suggestion_number(self):
        input_data = {
            "Suggestion number": "one",
            "Description": "This is a suggestion"
        }
        expected_output = "- **suggestion one:**\n  - **Description:** This is a suggestion\n\n"
        assert parse_code_suggestion(input_data) == expected_output
    # Tests that function returns correct output when 'before' or 'after' key has a non-string value
    def test_non_string_before_or_after(self):
        input_data = {
            "Code example": {
                "Before": 123,
                "After": ["a", "b", "c"]
            }
        }
        expected_output = "  - **Code example:**\n    - **Before:**\n        ```\n        123\n        ```\n    - **After:**\n        ```\n        ['a', 'b', 'c']\n        ```\n\n"  # noqa: E501
        assert parse_code_suggestion(input_data) == expected_output
    # Tests that function returns correct output when input dictionary does not have 'code example' key
    def test_no_code_example_key(self):
        code_suggestions = {
            'suggestion number': 1,
            'suggestion': 'Suggestion 1',
            'description': 'Description 1',
            'before': 'Before 1',
            'after': 'After 1'
        }
        expected_output = "- **suggestion 1:**\n  - **suggestion:** Suggestion 1\n  - **description:** Description 1\n  - **before:** Before 1\n  - **after:** After 1\n\n"  # noqa: E501
        assert parse_code_suggestion(code_suggestions) == expected_output
    # Tests that function returns correct output when input dictionary has 'code example' key
    def test_with_code_example_key(self):
        code_suggestions = {
            'suggestion number': 2,
            'suggestion': 'Suggestion 2',
            'description': 'Description 2',
            'code example': {
                'before': 'Before 2',
                'after': 'After 2'
            }
        }
        expected_output = "- **suggestion 2:**\n  - **suggestion:** Suggestion 2\n  - **description:** Description 2\n  - **code example:**\n    - **before:**\n        ```\n        Before 2\n        ```\n    - **after:**\n        ```\n        After 2\n        ```\n\n"  # noqa: E501
        assert parse_code_suggestion(code_suggestions) == expected_output