1. Move deployment_type to configuration.toml

2. Lint 3. Inject GitHub app installation ID into GitHub provider using the settings mechanism.
Merge remote-tracking branch 'origin/main' into feature/gitlab_provider
2025-07-21 04:50:39 +08:00 · 2023-07-11 16:55:09 +03:00 · 2023-07-11 15:49:06 +03:00 · 2023-07-11 15:37:09 +03:00 · 2023-07-11 15:29:36 +03:00 · 2023-07-11 15:01:52 +03:00
26 changed files with 399 additions and 120 deletions
--- a/PR_COMPRESSION.md
+++ b/PR_COMPRESSION.md
@ -0,0 +1,42 @@
+# Git Patch Logic
+There are two scenarios:
+1. The PR is small enough to fit in a single prompt (including system and user prompt)
+2. The PR is too large to fit in a single prompt (including system and user prompt)
+
+For both scenarios, we first use the following strategy
+#### Repo language prioritization strategy
+
+We prioritize the languages of the repo based on the following criteria:
+1. Exclude binary files and non code files (e.g. images, pdfs, etc)
+2. Given the main languages used in the repo
+2. We sort the PR files by the most common languages in the repo (in descending order): 
+   * ```[[file.py, file2.py],[file3.js, file4.jsx],[readme.md]]```
+   
+
+## Small PR
+In this case, we can fit the entire PR in a single prompt:
+1. Exclude binary files and non code files (e.g. images, pdfs, etc)
+2. We Expand the surrounding context of each patch to 6 lines above and below the patch
+## Large PR
+
+### Motivation
+Pull Requests can be very long and contain a lot of information with varying degree of relevance to the pr-agent.
+We want to be able to pack as much information as possible in a single LMM prompt, while keeping the information relevant to the pr-agent.
+
+
+
+#### PR compression strategy
+We prioritize additions over deletions:
+ - Combine all deleted files into a single list (`deleted files`)
+ - File patches are a list of hunks, remove all hunks of type deletion-only from the hunks in the file patch
+####  Adaptive and token-aware file patch fitting
+We use [tiktoken](https://github.com/openai/tiktoken) to tokenize the patches after the modifications described above, and we use the following strategy to fit the patches into the prompt:
+1. Withing each language we sort the files by the number of tokens in the file (in descending order):
+   * ```[[file2.py, file.py],[file4.jsx, file3.js],[readme.md]]```
+2. Iterate through the patches in the order described above
+2. Add the patches to the prompt until the prompt reaches a certain buffer from the max token length
+3. If there are still patches left, add the remaining patches as a list called `other modified files` to the prompt until the prompt reaches the max token length (hard stop), skip the rest of the patches.
+4. If we haven't reached the max token length, add the `deleted files` to the prompt until the prompt reaches the max token length (hard stop), skip the rest of the patches.
+
+### Example
+![](./pics/git_patch_logic.png)
--- a/README.md
+++ b/README.md
@ -1,28 +1,35 @@
 <div align="center">

-<img src="./pics/Icon-7.png" alt="pr-agent_icon" width="100"/>
+<div align="center">

-# pr-agent
+<img src="./pics/logo-dark.png#gh-dark-mode-only" width="250"/>
+<img src="./pics/logo-light.png#gh-light-mode-only" width="250"/>
+
+</div>

 [![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Codium-ai/pr-agent/blob/main/LICENSE)
 [![Discord](https://badgen.net/badge/icon/discord?icon=discord&label&color=purple)](https://discord.com/channels/1057273017547378788/1126104260430528613)

-CodiumAI `pr-agent` is an open-source tool is powered by GPT-4 aming to help developers review PRs faster and more efficiently. It automatically analyzes the PR, and provides feedback and suggestions, and can answer questions.
+CodiumAI `pr-agent` is an open-source tool aiming to help developers review PRs faster and more efficiently. It automatically analyzes the PR, provides feedback and suggestions, and can answer free-text questions.

 </div>

+- [Live demo](#live-demo)
 - [Quickstart](#Quickstart)
- [Usage and Tools](#usage-and-tools)
+- [Usage and tools](#usage-and-tools)
 - [Configuration](#Configuration)
+- [How it works](#how-it-works)
 - [Roadmap](#roadmap)
 - [Similar projects](#similar-projects)

 ## Live demo

-Experience GPT-4 powered PR review on your public Github repository with our hosted pr-agent. To try it, mention @CodiumAI-Agent in a PR comment! The agent will generate the review in response ([see details in the Usage section](#usage-and-tools)).
+Experience GPT-4 powered PR review on your public GitHub repository with our hosted pr-agent. To try it, just mention `@CodiumAI-Agent` in any PR comment! The agent will generate a PR review in response.

 ![Review generation process](./pics/pr-agent-review-process1.gif)

+To set up your own pr-agent, see the [Quickstart](#Quickstart) section
+
 ---

 ## Quickstart
@ -79,8 +86,8 @@ pip install -r requirements.txt
 3. Copy the secrets template file and fill in your OpenAI key and your GitHub user token:

 ```
-cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets
-# Edit .secrets file
+cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets.toml
+# Edit .secrets.toml file
 ```

 4. Run the appropriate Python scripts from the scripts folder:
@ -140,8 +147,8 @@ git clone https://github.com/Codium-ai/pr-agent.git
   - Copy your app's webhook secret to the webhook_secret field.

 ```
-cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets
-# Edit .secrets file
+cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets.toml
+# Edit .secrets.toml file
 ```

 6. Build a Docker image for the app and optionally push it to a Docker repository. We'll use Dockerhub as an example:
@ -179,7 +186,7 @@ Here is a quick overview of the different sub-tools of PR Reviewer:
  - PR description and title
  - PR type classification
  - Is the PR covered by relevant tests
-  - Is the PR minimal and focused
+  - Is this a focused PR
  - Are there security concerns
 - PR Feedback
  - General PR suggestions
@ -195,7 +202,7 @@ This is how a typical output of the PR Reviewer looks like:
 - 🔍 **Description and title:** Yes
 - 📌 **Type of PR:** Enhancement
 - 🧪 **Relevant tests added:** No
- ✨ **Minimal and focused:** Yes, the PR is focused on adding two new handlers for language extension and token counting.
+- ✨ **Focused PR:** Yes, the PR is focused on adding two new handlers for language extension and token counting.
 - 🔒 **Security concerns:** No, the PR does not introduce possible security concerns or issues.

 #### PR Feedback
@ -238,7 +245,7 @@ The different tools and sub-tools used by CodiumAI pr-agent are easily configura
 You can enable/disable the different PR Reviewer sub-sections with the following flags:

 ```
-require_minimal_and_focused_review=true
+require_focused_review=true
 require_tests_review=true
 require_security_review=true
 ```
@ -282,6 +289,12 @@ Example for extended suggestion:

 ---

+## How it works
+
+![PR-Agent Tools](./pics/pr_agent_overview.png)
+
+Check out the [PR Compression strategy](./PR_COMPRESSION.md) page for more details on how we convert a code diff to a manageable LLM prompt
+
 ## Roadmap

 - [ ] Support open-source models, as a replacement for openai models. Note that a minimal requirement for each open-source model is to have 8k+ context, and good support for generating json as an output
--- a/pics/git_patch_logic.png
+++ b/pics/git_patch_logic.png
--- a/pics/logo-dark.png
+++ b/pics/logo-dark.png
--- a/pics/logo-light.png
+++ b/pics/logo-light.png
--- a/pics/pr_agent_overview.png
+++ b/pics/pr_agent_overview.png
--- a/pr_agent/agent/pr_agent.py
+++ b/pr_agent/agent/pr_agent.py
@ -1,17 +1,16 @@
 import re
-from typing import Optional

 from pr_agent.tools.pr_questions import PRQuestions
 from pr_agent.tools.pr_reviewer import PRReviewer


 class PRAgent:
-    def __init__(self, installation_id: Optional[int] = None):
-        self.installation_id = installation_id
+    def __init__(self):
+        pass

    async def handle_request(self, pr_url, request):
        if 'please review' in request.lower() or 'review' == request.lower().strip() or len(request) == 0:
-            reviewer = PRReviewer(pr_url, self.installation_id)
+            reviewer = PRReviewer(pr_url)
            await reviewer.review()

        else:
@ -21,5 +20,5 @@ class PRAgent:
                question = re.split(r'(?i)answer', request)[1].strip()
            else:
                question = request
-            answerer = PRQuestions(pr_url, question, self.installation_id)
+            answerer = PRQuestions(pr_url, question)
            await answerer.answer()
--- a/pr_agent/algo/language_handler.py
+++ b/pr_agent/algo/language_handler.py
@ -58,7 +58,8 @@ bad_extensions = [
    'woff2',
    'xz',
    'zip',
-    'zst'
+    'zst',
+    'snap'
 ]


@ -92,7 +93,7 @@ def sort_files_by_main_languages(languages: Dict, files: list):
    for ext in main_extensions:
        main_extensions_flat.extend(ext)

-    for extensions, lang in zip(main_extensions, languages_sorted_list):
+    for extensions, lang in zip(main_extensions, languages_sorted_list):  # noqa: B905
        tmp = []
        for file in files_filtered:
            extension_str = f".{file.filename.split('.')[-1]}"
--- a/pr_agent/algo/utils.py
+++ b/pr_agent/algo/utils.py
@ -12,7 +12,7 @@ def convert_to_markdown(output_data: dict) -> str:
        "Type of PR": "📌",
        "Relevant tests added": "🧪",
        "Unrelated changes": "⚠️",
-        "Minimal and focused": "✨",
+        "Focused PR": "✨",
        "Security concerns": "🔒",
        "General PR suggestions": "💡",
        "Code suggestions": "🤖"
--- a/pr_agent/config_loader.py
+++ b/pr_agent/config_loader.py
@ -5,6 +5,7 @@ from dynaconf import Dynaconf
 current_dir = dirname(abspath(__file__))
 settings = Dynaconf(
    envvar_prefix=False,
+    merge_enabled=True,
    settings_files=[join(current_dir, f) for f in [
         "settings/.secrets.toml",
         "settings/configuration.toml",
--- a/pr_agent/git_providers/init.py
+++ b/pr_agent/git_providers/init.py
@ -1,15 +1,17 @@
 from pr_agent.config_loader import settings
 from pr_agent.git_providers.github_provider import GithubProvider
+from pr_agent.git_providers.gitlab_provider import GitLabProvider

 _GIT_PROVIDERS = {
-    'github': GithubProvider
+    'github': GithubProvider,
+    'gitlab': GitLabProvider,
 }

 def get_git_provider():
    try:
        provider_id = settings.config.git_provider
    except AttributeError as e:
-        raise ValueError("github_provider is a required attribute in the configuration file") from e
+        raise ValueError("git_provider is a required attribute in the configuration file") from e
    if provider_id not in _GIT_PROVIDERS:
        raise ValueError(f"Unknown git provider: {provider_id}")
    return _GIT_PROVIDERS[provider_id]
--- a/pr_agent/git_providers/git_provider.py
+++ b/pr_agent/git_providers/git_provider.py
@ -0,0 +1,82 @@
+from abc import ABC, abstractmethod
+from dataclasses import dataclass
+
+
+@dataclass
+class FilePatchInfo:
+    base_file: str
+    head_file: str
+    patch: str
+    filename: str
+    tokens: int = -1
+
+
+class GitProvider(ABC):
+    @abstractmethod
+    def get_diff_files(self) -> list[FilePatchInfo]:
+        pass
+
+    @abstractmethod
+    def publish_comment(self, pr_comment: str, is_temporary: bool = False):
+        pass
+
+    @abstractmethod
+    def remove_initial_comment(self):
+        pass
+
+    @abstractmethod
+    def get_languages(self):
+        pass
+
+    @abstractmethod
+    def get_pr_branch(self):
+        pass
+
+    @abstractmethod
+    def get_user_id(self):
+        pass
+
+    @abstractmethod
+    def get_pr_description(self):
+        pass
+
+
+def get_main_pr_language(languages, files) -> str:
+    """
+    Get the main language of the commit. Return an empty string if cannot determine.
+    """
+    main_language_str = ""
+    try:
+        top_language = max(languages, key=languages.get).lower()
+
+        # validate that the specific commit uses the main language
+        extension_list = []
+        for file in files:
+            extension_list.append(file.filename.rsplit('.')[-1])
+
+        # get the most common extension
+        most_common_extension = max(set(extension_list), key=extension_list.count)
+
+        # look for a match. TBD: add more languages, do this systematically
+        if most_common_extension == 'py' and top_language == 'python' or \
+                most_common_extension == 'js' and top_language == 'javascript' or \
+                most_common_extension == 'ts' and top_language == 'typescript' or \
+                most_common_extension == 'go' and top_language == 'go' or \
+                most_common_extension == 'java' and top_language == 'java' or \
+                most_common_extension == 'c' and top_language == 'c' or \
+                most_common_extension == 'cpp' and top_language == 'c++' or \
+                most_common_extension == 'cs' and top_language == 'c#' or \
+                most_common_extension == 'swift' and top_language == 'swift' or \
+                most_common_extension == 'php' and top_language == 'php' or \
+                most_common_extension == 'rb' and top_language == 'ruby' or \
+                most_common_extension == 'rs' and top_language == 'rust' or \
+                most_common_extension == 'scala' and top_language == 'scala' or \
+                most_common_extension == 'kt' and top_language == 'kotlin' or \
+                most_common_extension == 'pl' and top_language == 'perl' or \
+                most_common_extension == 'swift' and top_language == 'swift':
+            main_language_str = top_language
+
+    except Exception:
+        pass
+
+    return main_language_str
--- a/pr_agent/git_providers/github_provider.py
+++ b/pr_agent/git_providers/github_provider.py
@ -1,25 +1,18 @@
 import logging
-from collections import namedtuple
-from dataclasses import dataclass
 from datetime import datetime
 from typing import Optional, Tuple
 from urllib.parse import urlparse

-from github import AppAuthentication, File, Github
+from github import AppAuthentication, Github

 from pr_agent.config_loader import settings

-@dataclass
-class FilePatchInfo:
-    base_file: str
-    head_file: str
-    patch: str
-    filename: str
-    tokens: int = -1
+from .git_provider import FilePatchInfo
+

 class GithubProvider:
-    def __init__(self, pr_url: Optional[str] = None, installation_id: Optional[int] = None):
-        self.installation_id = installation_id
+    def __init__(self, pr_url: Optional[str] = None):
+        self.installation_id = settings.get("GITHUB.INSTALLATION_ID")
        self.github_client = self._get_github_client()
        self.repo = None
        self.pr_num = None
@ -32,6 +25,9 @@ class GithubProvider:
        self.repo, self.pr_num = self._parse_pr_url(pr_url)
        self.pr = self._get_pr()

+    def get_files(self):
+        return self.pr.get_files()
+
    def get_diff_files(self) -> list[FilePatchInfo]:
        files = self.pr.get_files()
        diff_files = []
@ -65,53 +61,15 @@ class GithubProvider:
        return self.pr.body

    def get_languages(self):
-        return self._get_repo().get_languages()
-
-    def get_main_pr_language(self) -> str:
-        """
-        Get the main language of the commit. Return an empty string if cannot determine.
-        """
-        main_language_str = ""
-        try:
-            languages = self.get_languages()
-            top_language = max(languages, key=languages.get).lower()
-
-            # validate that the specific commit uses the main language
-            extension_list = []
-            files = self.pr.get_files()
-            for file in files:
-                extension_list.append(file.filename.rsplit('.')[-1])
-
-            # get the most common extension
-            most_common_extension = max(set(extension_list), key=extension_list.count)
-
-            # look for a match. TBD: add more languages, do this systematically
-            if most_common_extension == 'py' and top_language == 'python' or \
-                    most_common_extension == 'js' and top_language == 'javascript' or \
-                    most_common_extension == 'ts' and top_language == 'typescript' or \
-                    most_common_extension == 'go' and top_language == 'go' or \
-                    most_common_extension == 'java' and top_language == 'java' or \
-                    most_common_extension == 'c' and top_language == 'c' or \
-                    most_common_extension == 'cpp' and top_language == 'c++' or \
-                    most_common_extension == 'cs' and top_language == 'c#' or \
-                    most_common_extension == 'swift' and top_language == 'swift' or \
-                    most_common_extension == 'php' and top_language == 'php' or \
-                    most_common_extension == 'rb' and top_language == 'ruby' or \
-                    most_common_extension == 'rs' and top_language == 'rust' or \
-                    most_common_extension == 'scala' and top_language == 'scala' or \
-                    most_common_extension == 'kt' and top_language == 'kotlin' or \
-                    most_common_extension == 'pl' and top_language == 'perl' or \
-                    most_common_extension == 'swift' and top_language == 'swift':
-                main_language_str = top_language
-
-        except Exception:
-            pass
-
-        return main_language_str
+        languages = self._get_repo().get_languages()
+        return languages

    def get_pr_branch(self):
        return self.pr.head.ref

+    def get_pr_description(self):
+        return self.pr.body
+
    def get_user_id(self):
        if not self.github_user_id:
            try:
--- a/pr_agent/git_providers/gitlab_provider.py
+++ b/pr_agent/git_providers/gitlab_provider.py
@ -0,0 +1,92 @@
+import logging
+from typing import Optional, Tuple
+from urllib.parse import urlparse
+
+import gitlab
+
+from pr_agent.config_loader import settings
+
+from .git_provider import FilePatchInfo, GitProvider
+
+
+class GitLabProvider(GitProvider):
+    def __init__(self, merge_request_url: Optional[str] = None):
+        gitlab_url = settings.get("GITLAB.URL", None)
+        if not gitlab_url:
+            raise ValueError("GitLab URL is not set in the config file")
+        gitlab_access_token = settings.get("GITLAB.PERSONAL_ACCESS_TOKEN", None)
+        if not gitlab_access_token:
+            raise ValueError("GitLab personal access token is not set in the config file")
+        self.gl = gitlab.Gitlab(
+            gitlab_url,
+            gitlab_access_token
+        )
+        self.id_project = None
+        self.id_mr = None
+        self.mr = None
+        self.temp_comments = []
+        self._set_merge_request(merge_request_url)
+
+    @property
+    def pr(self):
+        '''The GitLab terminology is merge request (MR) instead of pull request (PR)'''
+        return self.mr
+
+    def _set_merge_request(self, merge_request_url: str):
+        self.id_project, self.id_mr = self._parse_merge_request_url(merge_request_url)
+        self.mr = self._get_merge_request()
+
+    def get_diff_files(self) -> list[FilePatchInfo]:
+        diffs = self.mr.changes()['changes']
+        diff_files = [FilePatchInfo("", "", diff['diff'], diff['new_path']) for diff in diffs]
+        return diff_files
+
+    def get_files(self):
+        return [change['new_path'] for change in self.mr.changes()['changes']]
+
+    def publish_comment(self, mr_comment: str, is_temporary: bool = False):
+        comment = self.mr.notes.create({'body': mr_comment})
+        if is_temporary:
+            self.temp_comments.append(comment)
+
+    def remove_initial_comment(self):
+        try:
+            for comment in self.temp_comments:
+                comment.delete()
+        except Exception as e:
+            logging.exception(f"Failed to remove temp comments, error: {e}")
+
+    def get_title(self):
+        return self.mr.title
+
+    def get_description(self):
+        return self.mr.description
+
+    def get_languages(self):
+        languages = self.gl.projects.get(self.id_project).languages()
+        return languages
+
+    def get_pr_branch(self):
+        return self.mr.source_branch
+
+    def get_pr_description(self):
+        return self.mr.description
+
+    def _parse_merge_request_url(self, merge_request_url: str) -> Tuple[int, int]:
+        parsed_url = urlparse(merge_request_url)
+
+        path_parts = parsed_url.path.strip('/').split('/')
+        if path_parts[-2] != 'merge_requests':
+            raise ValueError("The provided URL does not appear to be a GitLab merge request URL")
+
+        try:
+            mr_id = int(path_parts[-1])
+        except ValueError as e:
+            raise ValueError("Unable to convert merge request ID to integer") from e
+
+        # Gitlab supports access by both project numeric ID as well as 'namespace/project_name'
+        return "/".join(path_parts[:2]), mr_id
+
+    def _get_merge_request(self):
+        mr = self.gl.projects.get(self.id_project).mergerequests.get(self.id_mr)
+        return mr
--- a/pr_agent/servers/github_app_webhook.py
+++ b/pr_agent/servers/github_app_webhook.py
@ -35,7 +35,8 @@ async def handle_github_webhooks(request: Request, response: Response):
 async def handle_request(body):
    action = body.get("action", None)
    installation_id = body.get("installation", {}).get("id", None)
-    agent = PRAgent(installation_id)
+    settings.set("GITHUB.INSTALLATION_ID", installation_id)
+    agent = PRAgent()
    if action == 'created':
        if "comment" not in body:
            return {}
@ -66,8 +67,8 @@ async def root():


 def start():
-    if settings.get("GITHUB.DEPLOYMENT_TYPE", "user") != "app":
-        raise Exception("Please set deployment type to app in .secrets.toml file")
+    # Override the deployment type to app
+    settings.set("GITHUB.DEPLOYMENT_TYPE", "app")
    app = FastAPI()
    app.include_router(router)

--- a/pr_agent/servers/github_polling.py
+++ b/pr_agent/servers/github_polling.py
@ -76,7 +76,8 @@ async def polling_loop():
                                                if comment['user']['login'] == user_id:
                                                    continue
                                            comment_body = comment['body'] if 'body' in comment else ''
-                                            commenter_github_user = comment['user']['login'] if 'user' in comment else ''
+                                            commenter_github_user = comment['user']['login'] \
+                                                if 'user' in comment else ''
                                            logging.info(f"Commenter: {commenter_github_user}\nComment: {comment_body}")
                                            user_tag = "@" + user_id
                                            if user_tag not in comment_body:
--- a/pr_agent/servers/gitlab_polling.py
+++ b/pr_agent/servers/gitlab_polling.py
@ -0,0 +1,64 @@
+import asyncio
+import time
+
+import gitlab
+
+from pr_agent.agent.pr_agent import PRAgent
+from pr_agent.config_loader import settings
+
+gl = gitlab.Gitlab(
+    settings.get("GITLAB.URL"),
+    private_token=settings.get("GITLAB.PERSONAL_ACCESS_TOKEN")
+)
+
+# Set the list of projects to monitor
+projects_to_monitor = settings.get("GITLAB.PROJECTS_TO_MONITOR")
+magic_word = settings.get("GITLAB.MAGIC_WORD")
+
+# Hold the previous seen comments
+previous_comments = set()
+
+
+def check_comments():
+    print('Polling')
+    new_comments = {}
+    for project in projects_to_monitor:
+        project = gl.projects.get(project)
+        merge_requests = project.mergerequests.list(state='opened')
+        for mr in merge_requests:
+            notes = mr.notes.list(get_all=True)
+            for note in notes:
+                if note.id not in previous_comments and note.body.startswith(magic_word):
+                    new_comments[note.id] = dict(
+                        body=note.body[len(magic_word):],
+                        project=project.name,
+                        mr=mr
+                    )
+                    previous_comments.add(note.id)
+                    print(f"New comment in project {project.name}, merge request {mr.title}: {note.body}")
+
+    return new_comments
+
+
+def handle_new_comments(new_comments):
+    print('Handling new comments')
+    agent = PRAgent()
+    for _, comment in new_comments.items():
+        print(f"Handling comment: {comment['body']}")
+        asyncio.run(agent.handle_request(comment['mr'].web_url, comment['body']))
+
+
+def run():
+    assert settings.get('CONFIG.GIT_PROVIDER') == 'gitlab', 'This script is only for GitLab'
+    # Initial run to populate previous_comments
+    check_comments()
+
+    # Run the check every minute
+    while True:
+        time.sleep(settings.get("GITLAB.POLLING_INTERVAL_SECONDS"))
+        new_comments = check_comments()
+        if new_comments:
+            handle_new_comments(new_comments)
+
+if __name__ == '__main__':
+    run()
--- a/pr_agent/settings/.secrets_template.toml
+++ b/pr_agent/settings/.secrets_template.toml
@ -1,5 +1,5 @@
 # QUICKSTART:
-# Copy this file to .secrets in the same folder.
+# Copy this file to .secrets.toml in the same folder.
 # The minimum workable settings - set openai.key to your API key.
 # Set github.deployment_type to "user" and github.user_token to your GitHub personal access token.
 # This will allow you to run the CLI scripts in the scripts/ folder and the github_polling server.
@ -11,9 +11,6 @@ key = "<API_KEY>"  # Acquire through https://platform.openai.com
 org = "<ORGANIZATION>"  # Optional, may be commented out.

 [github]
-# The type of deployment to create. Valid values are 'app' or 'user'.
-deployment_type = "user"
-
 # ---- Set the following only for deployment type == "user"
 user_token = "<TOKEN>"  # A GitHub personal access token with 'repo' scope.

@ -25,3 +22,8 @@ private_key = """\
 """
 app_id = 123456  # The GitHub App ID, replace with your own.
 webhook_secret = "<WEBHOOK SECRET>"  # Optional, may be commented out.
+
+[gitlab]
+# Gitlab personal access token
+personal_access_token = ""
+
--- a/pr_agent/settings/configuration.toml
+++ b/pr_agent/settings/configuration.toml
@ -5,11 +5,27 @@ publish_review=true
 verbosity_level=0  # 0,1,2

 [pr_reviewer]
-require_minimal_and_focused_review=true
+require_focused_review=true
 require_tests_review=true
 require_security_review=true
 extended_code_suggestions=false
 num_code_suggestions=4

+[pr_questions]

-[pr_questions]
+[github]
+# The type of deployment to create. Valid values are 'app' or 'user'.
+deployment_type = "user"
+
+[gitlab]
+# URL to the gitlab service
+gitlab_url = "https://gitlab.com"
+
+# Polling (either project id or namespace/project_name) syntax can be used
+projects_to_monitor = ['org_name/repo_name']
+
+# Polling trigger
+magic_word = "AutoReview"
+
+# Polling interval
+polling_interval_seconds = 30
--- a/pr_agent/settings/pr_reviewer_prompts.toml
+++ b/pr_agent/settings/pr_reviewer_prompts.toml
@ -30,10 +30,10 @@ You must use the following JSON schema to format your answer:
      "description": "yes\\no question: does this PR have relevant tests ?"
    },
 {%- endif %}
-{%- if require_minimal_and_focused %}
-    "Minimal and focused": {
+{%- if require_focused %}
+    "Focused PR": {
      "type": "string",
-      "description": "is this PR as minimal and focused as possible, with all code changes centered around a single coherent theme, described in the PR description and title ?" Make sure to explain your answer"
+      "description": "Is this a focused PR, in the sense that it has a clear and coherent title and description, and all PR code diff changes are properly derived from the title and description? Explain your response."
    }
  },
 {%- endif %}
@ -106,8 +106,8 @@ Example output:
 {%- if require_tests %}
        "Relevant tests added": "No",
 {%- endif %}
-{%- if require_minimal_and_focused %}
-        "Minimal and focused": "yes\\no, because ..."
+{%- if require_focused %}
+        "Focused PR": "yes\\no, because ..."
 {%- endif %}
    },
    "PR Feedback":
--- a/pr_agent/tools/pr_questions.py
+++ b/pr_agent/tools/pr_questions.py
@ -1,6 +1,5 @@
 import copy
 import logging
-from typing import Optional

 from jinja2 import Environment, StrictUndefined

@ -9,21 +8,23 @@ from pr_agent.algo.pr_processing import get_pr_diff
 from pr_agent.algo.token_handler import TokenHandler
 from pr_agent.config_loader import settings
 from pr_agent.git_providers import get_git_provider
+from pr_agent.git_providers.git_provider import get_main_pr_language


 class PRQuestions:
-    def __init__(self, pr_url: str, question_str: str, installation_id: Optional[int] = None):
-        self.git_provider = get_git_provider()(pr_url, installation_id)
-        self.main_pr_language = self.git_provider.get_main_pr_language()
-        self.installation_id = installation_id
+    def __init__(self, pr_url: str, question_str: str):
+        self.git_provider = get_git_provider()(pr_url)
+        self.main_pr_language = get_main_pr_language(
+            self.git_provider.get_languages(), self.git_provider.get_files()
+        )
        self.ai_handler = AiHandler()
        self.question_str = question_str
        self.vars = {
            "title": self.git_provider.pr.title,
            "branch": self.git_provider.get_pr_branch(),
-            "description": self.git_provider.pr.body,
-            "language": self.git_provider.get_main_pr_language(),
-            "diff": "", # empty diff for initial calculation
+            "description": self.git_provider.get_description(),
+            "language": self.main_pr_language,
+            "diff": "",  # empty diff for initial calculation
            "questions": self.question_str,
        }
        self.token_handler = TokenHandler(self.git_provider.pr,
--- a/pr_agent/tools/pr_reviewer.py
+++ b/pr_agent/tools/pr_reviewer.py
@ -1,7 +1,6 @@
 import copy
 import json
 import logging
-from typing import Optional

 from jinja2 import Environment, StrictUndefined

@ -11,14 +10,16 @@ from pr_agent.algo.token_handler import TokenHandler
 from pr_agent.algo.utils import convert_to_markdown
 from pr_agent.config_loader import settings
 from pr_agent.git_providers import get_git_provider
+from pr_agent.git_providers.git_provider import get_main_pr_language


 class PRReviewer:
-    def __init__(self, pr_url: str, installation_id: Optional[int] = None, cli_mode=False):
+    def __init__(self, pr_url: str, cli_mode=False):

-        self.git_provider = get_git_provider()(pr_url, installation_id)
-        self.main_language = self.git_provider.get_main_pr_language()
-        self.installation_id = installation_id
+        self.git_provider = get_git_provider()(pr_url)
+        self.main_language = get_main_pr_language(
+            self.git_provider.get_languages(), self.git_provider.get_files()
+        )
        self.ai_handler = AiHandler()
        self.patches_diff = None
        self.prediction = None
@ -26,12 +27,12 @@ class PRReviewer:
        self.vars = {
            "title": self.git_provider.pr.title,
            "branch": self.git_provider.get_pr_branch(),
-            "description": self.git_provider.pr.body,
+            "description": self.git_provider.get_pr_description(),
            "language": self.main_language,
            "diff": "",  # empty diff for initial calculation
            "require_tests": settings.pr_reviewer.require_tests_review,
            "require_security": settings.pr_reviewer.require_security_review,
-            "require_minimal_and_focused": settings.pr_reviewer.require_minimal_and_focused_review,
+            "require_focused": settings.pr_reviewer.require_focused_review,
            'extended_code_suggestions': settings.pr_reviewer.extended_code_suggestions,
            'num_code_suggestions': settings.pr_reviewer.num_code_suggestions,
        }
--- a/requirements.txt
+++ b/requirements.txt
@ -6,3 +6,6 @@ openai==0.27.8
 Jinja2==3.1.2
 tiktoken==0.4.0
 uvicorn==0.22.0
+python-gitlab==3.15.0
+pytest~=7.4.0
+aiohttp~=3.8.4
--- a/tests/unit/test_convert_to_markdown.py
+++ b/tests/unit/test_convert_to_markdown.py
@ -50,7 +50,7 @@ class TestConvertToMarkdown:
            'Type of PR': 'Test type',
            'Relevant tests added': 'no',
            'Unrelated changes': 'n/a',  # won't be included in the output
-            'Minimal and focused': 'Yes',
+            'Focused PR': 'Yes',
            'General PR suggestions': 'general suggestion...',
            'Code suggestions': [
                {
@ -74,12 +74,11 @@ class TestConvertToMarkdown:
 - 🔍 **Description and title:** Test description
 - 📌 **Type of PR:** Test type
 - 🧪 **Relevant tests added:** no
- ✨ **Minimal and focused:** Yes
+- ✨ **Focused PR:** Yes
 - 💡 **General PR suggestions:** general suggestion...

 - 🤖 **Code suggestions:**

- **suggestion 1:**
  - **Code example:**
    - **Before:**
        ```
@ -90,7 +89,6 @@ class TestConvertToMarkdown:
        Code after
        ```

- **suggestion 2:**
  - **Code example:**
    - **Before:**
        ```
@ -116,7 +114,7 @@ class TestConvertToMarkdown:
            'Type of PR': {},
            'Relevant tests added': {},
            'Unrelated changes': {},
-            'Minimal and focused': {},
+            'Focused PR': {},
            'General PR suggestions': {},
            'Code suggestions': {}
        }
--- a/tests/unit/test_language_handler.py
+++ b/tests/unit/test_language_handler.py
@ -1,15 +1,15 @@

 # Generated by CodiumAI
+
 from pr_agent.algo.language_handler import sort_files_by_main_languages

-
-import pytest
-
 """
 Code Analysis

 Objective:
-The objective of the function is to sort a list of files by their main language, putting the files that are in the main language first and the rest of the files after. It takes in a dictionary of languages and their sizes, and a list of files.
+The objective of the function is to sort a list of files by their main language, putting the files that are in the main 
+language first and the rest of the files after. It takes in a dictionary of languages and their sizes, and a list of 
+files.

 Inputs:
 - languages: a dictionary containing the languages and their sizes
@ -33,6 +33,8 @@ Additional aspects:
 - The function uses the filter_bad_extensions function to filter out files with bad extensions
 - The function uses a rest_files dictionary to store the files that do not belong to any of the main extensions
 """
+
+
 class TestSortFilesByMainLanguages:
    # Tests that files are sorted by main language, with files in main language first and the rest after
    def test_happy_path_sort_files_by_main_languages(self):
@ -118,4 +120,4 @@ class TestSortFilesByMainLanguages:
            {'language': 'C++', 'files': [files[2], files[7]]},
            {'language': 'Other', 'files': []}
        ]
-        assert sort_files_by_main_languages(languages, files) == expected_output
+        assert sort_files_by_main_languages(languages, files) == expected_output
--- a/tests/unit/test_parse_code_suggestion.py
+++ b/tests/unit/test_parse_code_suggestion.py
@ -47,7 +47,7 @@ class TestParseCodeSuggestion:
            "Suggestion number": "one",
            "Description": "This is a suggestion"
        }
-        expected_output = "- **suggestion one:**\n  - **Description:** This is a suggestion\n\n"
+        expected_output = "   **Description:** This is a suggestion\n\n"
        assert parse_code_suggestion(input_data) == expected_output

    # Tests that function returns correct output when 'before' or 'after' key has a non-string value
@ -70,7 +70,7 @@ class TestParseCodeSuggestion:
            'before': 'Before 1',
            'after': 'After 1'
        }
-        expected_output = "- **suggestion 1:**\n  - **suggestion:** Suggestion 1\n  - **description:** Description 1\n  - **before:** Before 1\n  - **after:** After 1\n\n"  # noqa: E501
+        expected_output = "   **suggestion:** Suggestion 1\n   **description:** Description 1\n   **before:** Before 1\n   **after:** After 1\n\n"  # noqa: E501
        assert parse_code_suggestion(code_suggestions) == expected_output

    # Tests that function returns correct output when input dictionary has 'code example' key
@ -84,5 +84,5 @@ class TestParseCodeSuggestion:
                'after': 'After 2'
            }
        }
-        expected_output = "- **suggestion 2:**\n  - **suggestion:** Suggestion 2\n  - **description:** Description 2\n  - **code example:**\n    - **before:**\n        ```\n        Before 2\n        ```\n    - **after:**\n        ```\n        After 2\n        ```\n\n"  # noqa: E501
+        expected_output = "   **suggestion:** Suggestion 2\n   **description:** Description 2\n  - **code example:**\n    - **before:**\n        ```\n        Before 2\n        ```\n    - **after:**\n        ```\n        After 2\n        ```\n\n"  # noqa: E501
        assert parse_code_suggestion(code_suggestions) == expected_output
Author	SHA1	Message	Date
Ori Kotek	b2d952cafa	1. Move deployment_type to configuration.toml 2. Lint 3. Inject GitHub app installation ID into GitHub provider using the settings mechanism.	2023-07-11 16:55:09 +03:00
Ori Kotek	6eacf4791d	Merge remote-tracking branch 'origin/main' into feature/gitlab_provider	2023-07-11 15:49:06 +03:00
Ori Kotek	4076f67ab8	Merge pull request #35 from ilchemla/hotfix/bad-filename-in-docs Fix secrets filename extension in README	2023-07-11 15:37:09 +03:00
Ori Kotek	c2639a2520	Merge pull request #32 from Codium-ai/tr/focused_pr Focused PR update	2023-07-11 15:29:36 +03:00
Ilan Chemla	38db65831e	Fix secrets filename extension in README	2023-07-11 15:01:52 +03:00
Hussam Lawen	e1b856f7e6	Merge pull request #34 from Codium-ai/enhancement/soft_and_hard_thresh Separate output token threshold to soft and hard instead of implicit hard = soft/2	2023-07-11 14:35:00 +03:00
mrT23	301622216f	Focused PR update	2023-07-11 08:50:28 +03:00
Ori Kotek	b63db6cef0	Merge pull request #29 from kaushnian/fix/rename-github_app Fix: Rename github_app_webhook.py to github_app.py	2023-07-09 18:16:44 +03:00
Eugene Kaushnian	8fba670bda	Rename github_app_webhook.py to github_app.py	2023-07-08 13:36:47 -04:00
salberts	ca47833c56	Merge remote-tracking branch 'refs/remotes/origin/feature/gitlab_provider' into feature/gitlab_provider	2023-07-08 17:19:54 +03:00
Albert	567475c18c	Update pr_agent/settings/.secrets_template.toml Co-authored-by: Sergii Kovalev <enasik@gmail.com>	2023-07-08 15:29:05 +03:00
salberts	fb4badd160	changes	2023-07-08 12:14:32 +03:00
salberts	9695d96799	Simplify project identification	2023-07-08 11:49:11 +03:00
salberts	0930f76cb7	Merge branch 'feature/gitlab_provider' into feature/gitlab_webhook	2023-07-08 11:47:13 +03:00
salberts	365559405f	Simplify gitlab project access	2023-07-08 11:46:41 +03:00
salberts	d4adcb3c22	Configurable polling interval	2023-07-08 10:26:41 +03:00
salberts	75167c2700	add polling	2023-07-08 08:52:11 +03:00
mrT23	78f5f58774	Merge pull request #27 from Codium-ai/logo-update update repo icons to new logos	2023-07-07 20:48:04 +03:00
Tom Brews Views	81a2e5cbe2	updte repo icons to new logos	2023-07-07 19:42:45 +03:00
salberts	e63a4f47ce	bugfixes	2023-07-07 17:06:53 +03:00
salberts	caff65613f	docs	2023-07-07 16:36:56 +03:00
salberts	ee3cac9836	bugfix	2023-07-07 16:33:25 +03:00
salberts	8b3ff7a632	bugfix	2023-07-07 16:31:28 +03:00
salberts	7d49e080fc	remove prints	2023-07-07 16:24:02 +03:00
salberts	1a94079936	style	2023-07-07 16:15:51 +03:00
salberts	7ed12c2f8e	refactor	2023-07-07 16:10:33 +03:00
Albert Achtenberg	ed8cf27b05	working example	2023-07-07 15:02:40 +03:00
mrT23	4b786b350e	Merge pull request #22 from Codium-ai/logo-improvements Logo improvements	2023-07-07 08:30:45 +03:00
Tom Brews Views	110d987514	adding space to the logo	2023-07-07 01:41:40 +03:00
Tom Brews Views	cc5e01cec5	dropping margin in favor of br	2023-07-07 01:33:36 +03:00
Tom Brews Views	620bf68d25	refactor margin	2023-07-07 01:28:20 +03:00
Tom Brews Views	86e5a30a36	margin refactor	2023-07-07 01:26:49 +03:00
Tom Brews Views	6c10f78c31	add more space to the logo	2023-07-07 01:23:47 +03:00
Tom Brews Views	46922d2842	use html instead of markup to control the width of the logo	2023-07-07 01:18:43 +03:00
Hussam.lawen	55ab198bb2	small fix in the figure	2023-07-06 22:12:56 +03:00
Hussam Lawen	0c7f048e58	Merge pull request #21 from Codium-ai/feature/skip_extensions exclude snap files	2023-07-06 20:28:20 +03:00
Hussam.lawen	efc8f755d5	exclude snap files	2023-07-06 20:22:54 +03:00
Ori Kotek	aebcb3f3c6	Merge pull request #20 from Codium-ai/bugfix/crash_protection Protect against no notifications received	2023-07-06 20:16:42 +03:00
Ori Kotek	1cedd13cf3	Merge pull request #19 from Codium-ai/enhancment/pr_modifications readme update	2023-07-06 19:55:24 +03:00
Ori Kotek	b7cd368cce	Merge pull request #16 from Codium-ai/bugfix/crash_protection Add exception protection for unexpected conditions during request handling	2023-07-06 19:54:55 +03:00
mrT23	6ef5843380	readme update	2023-07-06 19:52:44 +03:00
mrT23	c5f2abb548	Merge pull request #17 from Codium-ai/readme-horizontal-logo add horizontal logo for light and dark themes	2023-07-06 19:34:25 +03:00
Tom Brews Views	bfdff08cb8	reduce image size	2023-07-06 19:34:05 +03:00
Tom Brews Views	f1380df468	add horizontal logo for light and dark themes	2023-07-06 19:18:53 +03:00
Ori Kotek	2c4c7c485e	Merge pull request #15 from Codium-ai/bugfix/double_notifications Don't add "How to use" when running from the command line - a small c…	2023-07-06 18:36:27 +03:00
mrT23	f3df032f06	Merge pull request #14 from Codium-ai/docs/pr_compression_doc small change in "how it works" section	2023-07-06 18:34:08 +03:00
Hussam.lawen	e15559011d	small change in "how it works" section	2023-07-06 18:31:46 +03:00
Hussam Lawen	2434240f08	Merge pull request #13 from Codium-ai/docs/pr_compression_doc Docs/pr compression doc	2023-07-06 18:25:24 +03:00
Hussam.lawen	d3936122ec	Merge commit 'f1ab6ec88f4dc3e2abb90244de5a1f41d0492743' into docs/pr_compression_doc # Conflicts: # README.md	2023-07-06 18:23:19 +03:00
Hussam.lawen	c75f561701	Add how it works section	2023-07-06 18:19:06 +03:00
Hussam.lawen	d9bd73646c	update git patch logic figure	2023-07-06 17:59:02 +03:00
Hussam.lawen	13101df811	update overview figure	2023-07-06 17:49:19 +03:00
Hussam.lawen	64cb5da821	Merge commit 'deda4baa871d3dcd5b1692beea4d3c30db4f1955' into docs/pr_compression_doc	2023-07-06 17:46:58 +03:00
Hussam.lawen	f6f4d32edb	Add docs	2023-07-06 17:45:41 +03:00
Hussam.lawen	3e445c7e03	initial pr compression documentation	2023-07-06 15:26:56 +03:00