Compare commits

..

55 Commits

Author SHA1 Message Date
b2d952cafa 1. Move deployment_type to configuration.toml
2. Lint
3. Inject GitHub app installation ID into GitHub provider using the settings mechanism.
2023-07-11 16:55:09 +03:00
6eacf4791d Merge remote-tracking branch 'origin/main' into feature/gitlab_provider 2023-07-11 15:49:06 +03:00
4076f67ab8 Merge pull request #35 from ilchemla/hotfix/bad-filename-in-docs
Fix secrets filename extension in README
2023-07-11 15:37:09 +03:00
c2639a2520 Merge pull request #32 from Codium-ai/tr/focused_pr
Focused PR update
2023-07-11 15:29:36 +03:00
38db65831e Fix secrets filename extension in README 2023-07-11 15:01:52 +03:00
e1b856f7e6 Merge pull request #34 from Codium-ai/enhancement/soft_and_hard_thresh
Separate output token threshold to soft and hard instead of implicit hard = soft/2
2023-07-11 14:35:00 +03:00
301622216f Focused PR update 2023-07-11 08:50:28 +03:00
b63db6cef0 Merge pull request #29 from kaushnian/fix/rename-github_app
Fix: Rename github_app_webhook.py to github_app.py
2023-07-09 18:16:44 +03:00
8fba670bda Rename github_app_webhook.py to github_app.py 2023-07-08 13:36:47 -04:00
ca47833c56 Merge remote-tracking branch 'refs/remotes/origin/feature/gitlab_provider' into feature/gitlab_provider 2023-07-08 17:19:54 +03:00
567475c18c Update pr_agent/settings/.secrets_template.toml
Co-authored-by: Sergii Kovalev <enasik@gmail.com>
2023-07-08 15:29:05 +03:00
fb4badd160 changes 2023-07-08 12:14:32 +03:00
9695d96799 Simplify project identification 2023-07-08 11:49:11 +03:00
0930f76cb7 Merge branch 'feature/gitlab_provider' into feature/gitlab_webhook 2023-07-08 11:47:13 +03:00
365559405f Simplify gitlab project access 2023-07-08 11:46:41 +03:00
d4adcb3c22 Configurable polling interval 2023-07-08 10:26:41 +03:00
75167c2700 add polling 2023-07-08 08:52:11 +03:00
78f5f58774 Merge pull request #27 from Codium-ai/logo-update
update repo icons to new logos
2023-07-07 20:48:04 +03:00
81a2e5cbe2 updte repo icons to new logos 2023-07-07 19:42:45 +03:00
e63a4f47ce bugfixes 2023-07-07 17:06:53 +03:00
caff65613f docs 2023-07-07 16:36:56 +03:00
ee3cac9836 bugfix 2023-07-07 16:33:25 +03:00
8b3ff7a632 bugfix 2023-07-07 16:31:28 +03:00
7d49e080fc remove prints 2023-07-07 16:24:02 +03:00
1a94079936 style 2023-07-07 16:15:51 +03:00
7ed12c2f8e refactor 2023-07-07 16:10:33 +03:00
ed8cf27b05 working example 2023-07-07 15:02:40 +03:00
4b786b350e Merge pull request #22 from Codium-ai/logo-improvements
Logo improvements
2023-07-07 08:30:45 +03:00
110d987514 adding space to the logo 2023-07-07 01:41:40 +03:00
cc5e01cec5 dropping margin in favor of br 2023-07-07 01:33:36 +03:00
620bf68d25 refactor margin 2023-07-07 01:28:20 +03:00
86e5a30a36 margin refactor 2023-07-07 01:26:49 +03:00
6c10f78c31 add more space to the logo 2023-07-07 01:23:47 +03:00
46922d2842 use html instead of markup to control the width of the logo 2023-07-07 01:18:43 +03:00
55ab198bb2 small fix in the figure 2023-07-06 22:12:56 +03:00
0c7f048e58 Merge pull request #21 from Codium-ai/feature/skip_extensions
exclude snap files
2023-07-06 20:28:20 +03:00
efc8f755d5 exclude snap files 2023-07-06 20:22:54 +03:00
aebcb3f3c6 Merge pull request #20 from Codium-ai/bugfix/crash_protection
Protect against no notifications received
2023-07-06 20:16:42 +03:00
1cedd13cf3 Merge pull request #19 from Codium-ai/enhancment/pr_modifications
readme update
2023-07-06 19:55:24 +03:00
b7cd368cce Merge pull request #16 from Codium-ai/bugfix/crash_protection
Add exception protection for unexpected conditions during request handling
2023-07-06 19:54:55 +03:00
6ef5843380 readme update 2023-07-06 19:52:44 +03:00
c5f2abb548 Merge pull request #17 from Codium-ai/readme-horizontal-logo
add horizontal logo for light and dark themes
2023-07-06 19:34:25 +03:00
bfdff08cb8 reduce image size 2023-07-06 19:34:05 +03:00
f1380df468 add horizontal logo for light and dark themes 2023-07-06 19:18:53 +03:00
2c4c7c485e Merge pull request #15 from Codium-ai/bugfix/double_notifications
Don't add "How to use" when running from the command line - a small c…
2023-07-06 18:36:27 +03:00
f3df032f06 Merge pull request #14 from Codium-ai/docs/pr_compression_doc
small change in "how it works" section
2023-07-06 18:34:08 +03:00
e15559011d small change in "how it works" section 2023-07-06 18:31:46 +03:00
2434240f08 Merge pull request #13 from Codium-ai/docs/pr_compression_doc
Docs/pr compression doc
2023-07-06 18:25:24 +03:00
d3936122ec Merge commit 'f1ab6ec88f4dc3e2abb90244de5a1f41d0492743' into docs/pr_compression_doc
# Conflicts:
#	README.md
2023-07-06 18:23:19 +03:00
c75f561701 Add how it works section 2023-07-06 18:19:06 +03:00
d9bd73646c update git patch logic figure 2023-07-06 17:59:02 +03:00
13101df811 update overview figure 2023-07-06 17:49:19 +03:00
64cb5da821 Merge commit 'deda4baa871d3dcd5b1692beea4d3c30db4f1955' into docs/pr_compression_doc 2023-07-06 17:46:58 +03:00
f6f4d32edb Add docs 2023-07-06 17:45:41 +03:00
3e445c7e03 initial pr compression documentation 2023-07-06 15:26:56 +03:00
26 changed files with 399 additions and 120 deletions

42
PR_COMPRESSION.md Normal file
View File

@ -0,0 +1,42 @@
# Git Patch Logic
There are two scenarios:
1. The PR is small enough to fit in a single prompt (including system and user prompt)
2. The PR is too large to fit in a single prompt (including system and user prompt)
For both scenarios, we first use the following strategy
#### Repo language prioritization strategy
We prioritize the languages of the repo based on the following criteria:
1. Exclude binary files and non code files (e.g. images, pdfs, etc)
2. Given the main languages used in the repo
2. We sort the PR files by the most common languages in the repo (in descending order):
* ```[[file.py, file2.py],[file3.js, file4.jsx],[readme.md]]```
## Small PR
In this case, we can fit the entire PR in a single prompt:
1. Exclude binary files and non code files (e.g. images, pdfs, etc)
2. We Expand the surrounding context of each patch to 6 lines above and below the patch
## Large PR
### Motivation
Pull Requests can be very long and contain a lot of information with varying degree of relevance to the pr-agent.
We want to be able to pack as much information as possible in a single LMM prompt, while keeping the information relevant to the pr-agent.
#### PR compression strategy
We prioritize additions over deletions:
- Combine all deleted files into a single list (`deleted files`)
- File patches are a list of hunks, remove all hunks of type deletion-only from the hunks in the file patch
#### Adaptive and token-aware file patch fitting
We use [tiktoken](https://github.com/openai/tiktoken) to tokenize the patches after the modifications described above, and we use the following strategy to fit the patches into the prompt:
1. Withing each language we sort the files by the number of tokens in the file (in descending order):
* ```[[file2.py, file.py],[file4.jsx, file3.js],[readme.md]]```
2. Iterate through the patches in the order described above
2. Add the patches to the prompt until the prompt reaches a certain buffer from the max token length
3. If there are still patches left, add the remaining patches as a list called `other modified files` to the prompt until the prompt reaches the max token length (hard stop), skip the rest of the patches.
4. If we haven't reached the max token length, add the `deleted files` to the prompt until the prompt reaches the max token length (hard stop), skip the rest of the patches.
### Example
![](./pics/git_patch_logic.png)

View File

@ -1,28 +1,35 @@
<div align="center">
<img src="./pics/Icon-7.png" alt="pr-agent_icon" width="100"/>
<div align="center">
# pr-agent
<img src="./pics/logo-dark.png#gh-dark-mode-only" width="250"/>
<img src="./pics/logo-light.png#gh-light-mode-only" width="250"/>
</div>
[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Codium-ai/pr-agent/blob/main/LICENSE)
[![Discord](https://badgen.net/badge/icon/discord?icon=discord&label&color=purple)](https://discord.com/channels/1057273017547378788/1126104260430528613)
CodiumAI `pr-agent` is an open-source tool is powered by GPT-4 aming to help developers review PRs faster and more efficiently. It automatically analyzes the PR, and provides feedback and suggestions, and can answer questions.
CodiumAI `pr-agent` is an open-source tool aiming to help developers review PRs faster and more efficiently. It automatically analyzes the PR, provides feedback and suggestions, and can answer free-text questions.
</div>
- [Live demo](#live-demo)
- [Quickstart](#Quickstart)
- [Usage and Tools](#usage-and-tools)
- [Usage and tools](#usage-and-tools)
- [Configuration](#Configuration)
- [How it works](#how-it-works)
- [Roadmap](#roadmap)
- [Similar projects](#similar-projects)
## Live demo
Experience GPT-4 powered PR review on your public Github repository with our hosted pr-agent. To try it, mention @CodiumAI-Agent in a PR comment! The agent will generate the review in response ([see details in the Usage section](#usage-and-tools)).
Experience GPT-4 powered PR review on your public GitHub repository with our hosted pr-agent. To try it, just mention `@CodiumAI-Agent` in any PR comment! The agent will generate a PR review in response.
![Review generation process](./pics/pr-agent-review-process1.gif)
To set up your own pr-agent, see the [Quickstart](#Quickstart) section
---
## Quickstart
@ -79,8 +86,8 @@ pip install -r requirements.txt
3. Copy the secrets template file and fill in your OpenAI key and your GitHub user token:
```
cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets
# Edit .secrets file
cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets.toml
# Edit .secrets.toml file
```
4. Run the appropriate Python scripts from the scripts folder:
@ -140,8 +147,8 @@ git clone https://github.com/Codium-ai/pr-agent.git
- Copy your app's webhook secret to the webhook_secret field.
```
cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets
# Edit .secrets file
cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets.toml
# Edit .secrets.toml file
```
6. Build a Docker image for the app and optionally push it to a Docker repository. We'll use Dockerhub as an example:
@ -179,7 +186,7 @@ Here is a quick overview of the different sub-tools of PR Reviewer:
- PR description and title
- PR type classification
- Is the PR covered by relevant tests
- Is the PR minimal and focused
- Is this a focused PR
- Are there security concerns
- PR Feedback
- General PR suggestions
@ -195,7 +202,7 @@ This is how a typical output of the PR Reviewer looks like:
- 🔍 **Description and title:** Yes
- 📌 **Type of PR:** Enhancement
- 🧪 **Relevant tests added:** No
-**Minimal and focused:** Yes, the PR is focused on adding two new handlers for language extension and token counting.
-**Focused PR:** Yes, the PR is focused on adding two new handlers for language extension and token counting.
- 🔒 **Security concerns:** No, the PR does not introduce possible security concerns or issues.
#### PR Feedback
@ -238,7 +245,7 @@ The different tools and sub-tools used by CodiumAI pr-agent are easily configura
You can enable/disable the different PR Reviewer sub-sections with the following flags:
```
require_minimal_and_focused_review=true
require_focused_review=true
require_tests_review=true
require_security_review=true
```
@ -282,6 +289,12 @@ Example for extended suggestion:
---
## How it works
![PR-Agent Tools](./pics/pr_agent_overview.png)
Check out the [PR Compression strategy](./PR_COMPRESSION.md) page for more details on how we convert a code diff to a manageable LLM prompt
## Roadmap
- [ ] Support open-source models, as a replacement for openai models. Note that a minimal requirement for each open-source model is to have 8k+ context, and good support for generating json as an output

BIN
pics/git_patch_logic.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 346 KiB

BIN
pics/logo-dark.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

BIN
pics/logo-light.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

BIN
pics/pr_agent_overview.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 413 KiB

View File

@ -1,17 +1,16 @@
import re
from typing import Optional
from pr_agent.tools.pr_questions import PRQuestions
from pr_agent.tools.pr_reviewer import PRReviewer
class PRAgent:
def __init__(self, installation_id: Optional[int] = None):
self.installation_id = installation_id
def __init__(self):
pass
async def handle_request(self, pr_url, request):
if 'please review' in request.lower() or 'review' == request.lower().strip() or len(request) == 0:
reviewer = PRReviewer(pr_url, self.installation_id)
reviewer = PRReviewer(pr_url)
await reviewer.review()
else:
@ -21,5 +20,5 @@ class PRAgent:
question = re.split(r'(?i)answer', request)[1].strip()
else:
question = request
answerer = PRQuestions(pr_url, question, self.installation_id)
answerer = PRQuestions(pr_url, question)
await answerer.answer()

View File

@ -58,7 +58,8 @@ bad_extensions = [
'woff2',
'xz',
'zip',
'zst'
'zst',
'snap'
]
@ -92,7 +93,7 @@ def sort_files_by_main_languages(languages: Dict, files: list):
for ext in main_extensions:
main_extensions_flat.extend(ext)
for extensions, lang in zip(main_extensions, languages_sorted_list):
for extensions, lang in zip(main_extensions, languages_sorted_list): # noqa: B905
tmp = []
for file in files_filtered:
extension_str = f".{file.filename.split('.')[-1]}"

View File

@ -12,7 +12,7 @@ def convert_to_markdown(output_data: dict) -> str:
"Type of PR": "📌",
"Relevant tests added": "🧪",
"Unrelated changes": "⚠️",
"Minimal and focused": "",
"Focused PR": "",
"Security concerns": "🔒",
"General PR suggestions": "💡",
"Code suggestions": "🤖"

View File

@ -5,6 +5,7 @@ from dynaconf import Dynaconf
current_dir = dirname(abspath(__file__))
settings = Dynaconf(
envvar_prefix=False,
merge_enabled=True,
settings_files=[join(current_dir, f) for f in [
"settings/.secrets.toml",
"settings/configuration.toml",

View File

@ -1,15 +1,17 @@
from pr_agent.config_loader import settings
from pr_agent.git_providers.github_provider import GithubProvider
from pr_agent.git_providers.gitlab_provider import GitLabProvider
_GIT_PROVIDERS = {
'github': GithubProvider
'github': GithubProvider,
'gitlab': GitLabProvider,
}
def get_git_provider():
try:
provider_id = settings.config.git_provider
except AttributeError as e:
raise ValueError("github_provider is a required attribute in the configuration file") from e
raise ValueError("git_provider is a required attribute in the configuration file") from e
if provider_id not in _GIT_PROVIDERS:
raise ValueError(f"Unknown git provider: {provider_id}")
return _GIT_PROVIDERS[provider_id]

View File

@ -0,0 +1,82 @@
from abc import ABC, abstractmethod
from dataclasses import dataclass
@dataclass
class FilePatchInfo:
base_file: str
head_file: str
patch: str
filename: str
tokens: int = -1
class GitProvider(ABC):
@abstractmethod
def get_diff_files(self) -> list[FilePatchInfo]:
pass
@abstractmethod
def publish_comment(self, pr_comment: str, is_temporary: bool = False):
pass
@abstractmethod
def remove_initial_comment(self):
pass
@abstractmethod
def get_languages(self):
pass
@abstractmethod
def get_pr_branch(self):
pass
@abstractmethod
def get_user_id(self):
pass
@abstractmethod
def get_pr_description(self):
pass
def get_main_pr_language(languages, files) -> str:
"""
Get the main language of the commit. Return an empty string if cannot determine.
"""
main_language_str = ""
try:
top_language = max(languages, key=languages.get).lower()
# validate that the specific commit uses the main language
extension_list = []
for file in files:
extension_list.append(file.filename.rsplit('.')[-1])
# get the most common extension
most_common_extension = max(set(extension_list), key=extension_list.count)
# look for a match. TBD: add more languages, do this systematically
if most_common_extension == 'py' and top_language == 'python' or \
most_common_extension == 'js' and top_language == 'javascript' or \
most_common_extension == 'ts' and top_language == 'typescript' or \
most_common_extension == 'go' and top_language == 'go' or \
most_common_extension == 'java' and top_language == 'java' or \
most_common_extension == 'c' and top_language == 'c' or \
most_common_extension == 'cpp' and top_language == 'c++' or \
most_common_extension == 'cs' and top_language == 'c#' or \
most_common_extension == 'swift' and top_language == 'swift' or \
most_common_extension == 'php' and top_language == 'php' or \
most_common_extension == 'rb' and top_language == 'ruby' or \
most_common_extension == 'rs' and top_language == 'rust' or \
most_common_extension == 'scala' and top_language == 'scala' or \
most_common_extension == 'kt' and top_language == 'kotlin' or \
most_common_extension == 'pl' and top_language == 'perl' or \
most_common_extension == 'swift' and top_language == 'swift':
main_language_str = top_language
except Exception:
pass
return main_language_str

View File

@ -1,25 +1,18 @@
import logging
from collections import namedtuple
from dataclasses import dataclass
from datetime import datetime
from typing import Optional, Tuple
from urllib.parse import urlparse
from github import AppAuthentication, File, Github
from github import AppAuthentication, Github
from pr_agent.config_loader import settings
@dataclass
class FilePatchInfo:
base_file: str
head_file: str
patch: str
filename: str
tokens: int = -1
from .git_provider import FilePatchInfo
class GithubProvider:
def __init__(self, pr_url: Optional[str] = None, installation_id: Optional[int] = None):
self.installation_id = installation_id
def __init__(self, pr_url: Optional[str] = None):
self.installation_id = settings.get("GITHUB.INSTALLATION_ID")
self.github_client = self._get_github_client()
self.repo = None
self.pr_num = None
@ -32,6 +25,9 @@ class GithubProvider:
self.repo, self.pr_num = self._parse_pr_url(pr_url)
self.pr = self._get_pr()
def get_files(self):
return self.pr.get_files()
def get_diff_files(self) -> list[FilePatchInfo]:
files = self.pr.get_files()
diff_files = []
@ -65,53 +61,15 @@ class GithubProvider:
return self.pr.body
def get_languages(self):
return self._get_repo().get_languages()
def get_main_pr_language(self) -> str:
"""
Get the main language of the commit. Return an empty string if cannot determine.
"""
main_language_str = ""
try:
languages = self.get_languages()
top_language = max(languages, key=languages.get).lower()
# validate that the specific commit uses the main language
extension_list = []
files = self.pr.get_files()
for file in files:
extension_list.append(file.filename.rsplit('.')[-1])
# get the most common extension
most_common_extension = max(set(extension_list), key=extension_list.count)
# look for a match. TBD: add more languages, do this systematically
if most_common_extension == 'py' and top_language == 'python' or \
most_common_extension == 'js' and top_language == 'javascript' or \
most_common_extension == 'ts' and top_language == 'typescript' or \
most_common_extension == 'go' and top_language == 'go' or \
most_common_extension == 'java' and top_language == 'java' or \
most_common_extension == 'c' and top_language == 'c' or \
most_common_extension == 'cpp' and top_language == 'c++' or \
most_common_extension == 'cs' and top_language == 'c#' or \
most_common_extension == 'swift' and top_language == 'swift' or \
most_common_extension == 'php' and top_language == 'php' or \
most_common_extension == 'rb' and top_language == 'ruby' or \
most_common_extension == 'rs' and top_language == 'rust' or \
most_common_extension == 'scala' and top_language == 'scala' or \
most_common_extension == 'kt' and top_language == 'kotlin' or \
most_common_extension == 'pl' and top_language == 'perl' or \
most_common_extension == 'swift' and top_language == 'swift':
main_language_str = top_language
except Exception:
pass
return main_language_str
languages = self._get_repo().get_languages()
return languages
def get_pr_branch(self):
return self.pr.head.ref
def get_pr_description(self):
return self.pr.body
def get_user_id(self):
if not self.github_user_id:
try:

View File

@ -0,0 +1,92 @@
import logging
from typing import Optional, Tuple
from urllib.parse import urlparse
import gitlab
from pr_agent.config_loader import settings
from .git_provider import FilePatchInfo, GitProvider
class GitLabProvider(GitProvider):
def __init__(self, merge_request_url: Optional[str] = None):
gitlab_url = settings.get("GITLAB.URL", None)
if not gitlab_url:
raise ValueError("GitLab URL is not set in the config file")
gitlab_access_token = settings.get("GITLAB.PERSONAL_ACCESS_TOKEN", None)
if not gitlab_access_token:
raise ValueError("GitLab personal access token is not set in the config file")
self.gl = gitlab.Gitlab(
gitlab_url,
gitlab_access_token
)
self.id_project = None
self.id_mr = None
self.mr = None
self.temp_comments = []
self._set_merge_request(merge_request_url)
@property
def pr(self):
'''The GitLab terminology is merge request (MR) instead of pull request (PR)'''
return self.mr
def _set_merge_request(self, merge_request_url: str):
self.id_project, self.id_mr = self._parse_merge_request_url(merge_request_url)
self.mr = self._get_merge_request()
def get_diff_files(self) -> list[FilePatchInfo]:
diffs = self.mr.changes()['changes']
diff_files = [FilePatchInfo("", "", diff['diff'], diff['new_path']) for diff in diffs]
return diff_files
def get_files(self):
return [change['new_path'] for change in self.mr.changes()['changes']]
def publish_comment(self, mr_comment: str, is_temporary: bool = False):
comment = self.mr.notes.create({'body': mr_comment})
if is_temporary:
self.temp_comments.append(comment)
def remove_initial_comment(self):
try:
for comment in self.temp_comments:
comment.delete()
except Exception as e:
logging.exception(f"Failed to remove temp comments, error: {e}")
def get_title(self):
return self.mr.title
def get_description(self):
return self.mr.description
def get_languages(self):
languages = self.gl.projects.get(self.id_project).languages()
return languages
def get_pr_branch(self):
return self.mr.source_branch
def get_pr_description(self):
return self.mr.description
def _parse_merge_request_url(self, merge_request_url: str) -> Tuple[int, int]:
parsed_url = urlparse(merge_request_url)
path_parts = parsed_url.path.strip('/').split('/')
if path_parts[-2] != 'merge_requests':
raise ValueError("The provided URL does not appear to be a GitLab merge request URL")
try:
mr_id = int(path_parts[-1])
except ValueError as e:
raise ValueError("Unable to convert merge request ID to integer") from e
# Gitlab supports access by both project numeric ID as well as 'namespace/project_name'
return "/".join(path_parts[:2]), mr_id
def _get_merge_request(self):
mr = self.gl.projects.get(self.id_project).mergerequests.get(self.id_mr)
return mr

View File

@ -35,7 +35,8 @@ async def handle_github_webhooks(request: Request, response: Response):
async def handle_request(body):
action = body.get("action", None)
installation_id = body.get("installation", {}).get("id", None)
agent = PRAgent(installation_id)
settings.set("GITHUB.INSTALLATION_ID", installation_id)
agent = PRAgent()
if action == 'created':
if "comment" not in body:
return {}
@ -66,8 +67,8 @@ async def root():
def start():
if settings.get("GITHUB.DEPLOYMENT_TYPE", "user") != "app":
raise Exception("Please set deployment type to app in .secrets.toml file")
# Override the deployment type to app
settings.set("GITHUB.DEPLOYMENT_TYPE", "app")
app = FastAPI()
app.include_router(router)

View File

@ -76,7 +76,8 @@ async def polling_loop():
if comment['user']['login'] == user_id:
continue
comment_body = comment['body'] if 'body' in comment else ''
commenter_github_user = comment['user']['login'] if 'user' in comment else ''
commenter_github_user = comment['user']['login'] \
if 'user' in comment else ''
logging.info(f"Commenter: {commenter_github_user}\nComment: {comment_body}")
user_tag = "@" + user_id
if user_tag not in comment_body:

View File

@ -0,0 +1,64 @@
import asyncio
import time
import gitlab
from pr_agent.agent.pr_agent import PRAgent
from pr_agent.config_loader import settings
gl = gitlab.Gitlab(
settings.get("GITLAB.URL"),
private_token=settings.get("GITLAB.PERSONAL_ACCESS_TOKEN")
)
# Set the list of projects to monitor
projects_to_monitor = settings.get("GITLAB.PROJECTS_TO_MONITOR")
magic_word = settings.get("GITLAB.MAGIC_WORD")
# Hold the previous seen comments
previous_comments = set()
def check_comments():
print('Polling')
new_comments = {}
for project in projects_to_monitor:
project = gl.projects.get(project)
merge_requests = project.mergerequests.list(state='opened')
for mr in merge_requests:
notes = mr.notes.list(get_all=True)
for note in notes:
if note.id not in previous_comments and note.body.startswith(magic_word):
new_comments[note.id] = dict(
body=note.body[len(magic_word):],
project=project.name,
mr=mr
)
previous_comments.add(note.id)
print(f"New comment in project {project.name}, merge request {mr.title}: {note.body}")
return new_comments
def handle_new_comments(new_comments):
print('Handling new comments')
agent = PRAgent()
for _, comment in new_comments.items():
print(f"Handling comment: {comment['body']}")
asyncio.run(agent.handle_request(comment['mr'].web_url, comment['body']))
def run():
assert settings.get('CONFIG.GIT_PROVIDER') == 'gitlab', 'This script is only for GitLab'
# Initial run to populate previous_comments
check_comments()
# Run the check every minute
while True:
time.sleep(settings.get("GITLAB.POLLING_INTERVAL_SECONDS"))
new_comments = check_comments()
if new_comments:
handle_new_comments(new_comments)
if __name__ == '__main__':
run()

View File

@ -1,5 +1,5 @@
# QUICKSTART:
# Copy this file to .secrets in the same folder.
# Copy this file to .secrets.toml in the same folder.
# The minimum workable settings - set openai.key to your API key.
# Set github.deployment_type to "user" and github.user_token to your GitHub personal access token.
# This will allow you to run the CLI scripts in the scripts/ folder and the github_polling server.
@ -11,9 +11,6 @@ key = "<API_KEY>" # Acquire through https://platform.openai.com
org = "<ORGANIZATION>" # Optional, may be commented out.
[github]
# The type of deployment to create. Valid values are 'app' or 'user'.
deployment_type = "user"
# ---- Set the following only for deployment type == "user"
user_token = "<TOKEN>" # A GitHub personal access token with 'repo' scope.
@ -25,3 +22,8 @@ private_key = """\
"""
app_id = 123456 # The GitHub App ID, replace with your own.
webhook_secret = "<WEBHOOK SECRET>" # Optional, may be commented out.
[gitlab]
# Gitlab personal access token
personal_access_token = ""

View File

@ -5,11 +5,27 @@ publish_review=true
verbosity_level=0 # 0,1,2
[pr_reviewer]
require_minimal_and_focused_review=true
require_focused_review=true
require_tests_review=true
require_security_review=true
extended_code_suggestions=false
num_code_suggestions=4
[pr_questions]
[pr_questions]
[github]
# The type of deployment to create. Valid values are 'app' or 'user'.
deployment_type = "user"
[gitlab]
# URL to the gitlab service
gitlab_url = "https://gitlab.com"
# Polling (either project id or namespace/project_name) syntax can be used
projects_to_monitor = ['org_name/repo_name']
# Polling trigger
magic_word = "AutoReview"
# Polling interval
polling_interval_seconds = 30

View File

@ -30,10 +30,10 @@ You must use the following JSON schema to format your answer:
"description": "yes\\no question: does this PR have relevant tests ?"
},
{%- endif %}
{%- if require_minimal_and_focused %}
"Minimal and focused": {
{%- if require_focused %}
"Focused PR": {
"type": "string",
"description": "is this PR as minimal and focused as possible, with all code changes centered around a single coherent theme, described in the PR description and title ?" Make sure to explain your answer"
"description": "Is this a focused PR, in the sense that it has a clear and coherent title and description, and all PR code diff changes are properly derived from the title and description? Explain your response."
}
},
{%- endif %}
@ -106,8 +106,8 @@ Example output:
{%- if require_tests %}
"Relevant tests added": "No",
{%- endif %}
{%- if require_minimal_and_focused %}
"Minimal and focused": "yes\\no, because ..."
{%- if require_focused %}
"Focused PR": "yes\\no, because ..."
{%- endif %}
},
"PR Feedback":

View File

@ -1,6 +1,5 @@
import copy
import logging
from typing import Optional
from jinja2 import Environment, StrictUndefined
@ -9,21 +8,23 @@ from pr_agent.algo.pr_processing import get_pr_diff
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider
from pr_agent.git_providers.git_provider import get_main_pr_language
class PRQuestions:
def __init__(self, pr_url: str, question_str: str, installation_id: Optional[int] = None):
self.git_provider = get_git_provider()(pr_url, installation_id)
self.main_pr_language = self.git_provider.get_main_pr_language()
self.installation_id = installation_id
def __init__(self, pr_url: str, question_str: str):
self.git_provider = get_git_provider()(pr_url)
self.main_pr_language = get_main_pr_language(
self.git_provider.get_languages(), self.git_provider.get_files()
)
self.ai_handler = AiHandler()
self.question_str = question_str
self.vars = {
"title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.pr.body,
"language": self.git_provider.get_main_pr_language(),
"diff": "", # empty diff for initial calculation
"description": self.git_provider.get_description(),
"language": self.main_pr_language,
"diff": "", # empty diff for initial calculation
"questions": self.question_str,
}
self.token_handler = TokenHandler(self.git_provider.pr,

View File

@ -1,7 +1,6 @@
import copy
import json
import logging
from typing import Optional
from jinja2 import Environment, StrictUndefined
@ -11,14 +10,16 @@ from pr_agent.algo.token_handler import TokenHandler
from pr_agent.algo.utils import convert_to_markdown
from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider
from pr_agent.git_providers.git_provider import get_main_pr_language
class PRReviewer:
def __init__(self, pr_url: str, installation_id: Optional[int] = None, cli_mode=False):
def __init__(self, pr_url: str, cli_mode=False):
self.git_provider = get_git_provider()(pr_url, installation_id)
self.main_language = self.git_provider.get_main_pr_language()
self.installation_id = installation_id
self.git_provider = get_git_provider()(pr_url)
self.main_language = get_main_pr_language(
self.git_provider.get_languages(), self.git_provider.get_files()
)
self.ai_handler = AiHandler()
self.patches_diff = None
self.prediction = None
@ -26,12 +27,12 @@ class PRReviewer:
self.vars = {
"title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.pr.body,
"description": self.git_provider.get_pr_description(),
"language": self.main_language,
"diff": "", # empty diff for initial calculation
"require_tests": settings.pr_reviewer.require_tests_review,
"require_security": settings.pr_reviewer.require_security_review,
"require_minimal_and_focused": settings.pr_reviewer.require_minimal_and_focused_review,
"require_focused": settings.pr_reviewer.require_focused_review,
'extended_code_suggestions': settings.pr_reviewer.extended_code_suggestions,
'num_code_suggestions': settings.pr_reviewer.num_code_suggestions,
}

View File

@ -6,3 +6,6 @@ openai==0.27.8
Jinja2==3.1.2
tiktoken==0.4.0
uvicorn==0.22.0
python-gitlab==3.15.0
pytest~=7.4.0
aiohttp~=3.8.4

View File

@ -50,7 +50,7 @@ class TestConvertToMarkdown:
'Type of PR': 'Test type',
'Relevant tests added': 'no',
'Unrelated changes': 'n/a', # won't be included in the output
'Minimal and focused': 'Yes',
'Focused PR': 'Yes',
'General PR suggestions': 'general suggestion...',
'Code suggestions': [
{
@ -74,12 +74,11 @@ class TestConvertToMarkdown:
- 🔍 **Description and title:** Test description
- 📌 **Type of PR:** Test type
- 🧪 **Relevant tests added:** no
- ✨ **Minimal and focused:** Yes
- ✨ **Focused PR:** Yes
- 💡 **General PR suggestions:** general suggestion...
- 🤖 **Code suggestions:**
- **suggestion 1:**
- **Code example:**
- **Before:**
```
@ -90,7 +89,6 @@ class TestConvertToMarkdown:
Code after
```
- **suggestion 2:**
- **Code example:**
- **Before:**
```
@ -116,7 +114,7 @@ class TestConvertToMarkdown:
'Type of PR': {},
'Relevant tests added': {},
'Unrelated changes': {},
'Minimal and focused': {},
'Focused PR': {},
'General PR suggestions': {},
'Code suggestions': {}
}

View File

@ -1,15 +1,15 @@
# Generated by CodiumAI
from pr_agent.algo.language_handler import sort_files_by_main_languages
import pytest
"""
Code Analysis
Objective:
The objective of the function is to sort a list of files by their main language, putting the files that are in the main language first and the rest of the files after. It takes in a dictionary of languages and their sizes, and a list of files.
The objective of the function is to sort a list of files by their main language, putting the files that are in the main
language first and the rest of the files after. It takes in a dictionary of languages and their sizes, and a list of
files.
Inputs:
- languages: a dictionary containing the languages and their sizes
@ -33,6 +33,8 @@ Additional aspects:
- The function uses the filter_bad_extensions function to filter out files with bad extensions
- The function uses a rest_files dictionary to store the files that do not belong to any of the main extensions
"""
class TestSortFilesByMainLanguages:
# Tests that files are sorted by main language, with files in main language first and the rest after
def test_happy_path_sort_files_by_main_languages(self):
@ -118,4 +120,4 @@ class TestSortFilesByMainLanguages:
{'language': 'C++', 'files': [files[2], files[7]]},
{'language': 'Other', 'files': []}
]
assert sort_files_by_main_languages(languages, files) == expected_output
assert sort_files_by_main_languages(languages, files) == expected_output

View File

@ -47,7 +47,7 @@ class TestParseCodeSuggestion:
"Suggestion number": "one",
"Description": "This is a suggestion"
}
expected_output = "- **suggestion one:**\n - **Description:** This is a suggestion\n\n"
expected_output = " **Description:** This is a suggestion\n\n"
assert parse_code_suggestion(input_data) == expected_output
# Tests that function returns correct output when 'before' or 'after' key has a non-string value
@ -70,7 +70,7 @@ class TestParseCodeSuggestion:
'before': 'Before 1',
'after': 'After 1'
}
expected_output = "- **suggestion 1:**\n - **suggestion:** Suggestion 1\n - **description:** Description 1\n - **before:** Before 1\n - **after:** After 1\n\n" # noqa: E501
expected_output = " **suggestion:** Suggestion 1\n **description:** Description 1\n **before:** Before 1\n **after:** After 1\n\n" # noqa: E501
assert parse_code_suggestion(code_suggestions) == expected_output
# Tests that function returns correct output when input dictionary has 'code example' key
@ -84,5 +84,5 @@ class TestParseCodeSuggestion:
'after': 'After 2'
}
}
expected_output = "- **suggestion 2:**\n - **suggestion:** Suggestion 2\n - **description:** Description 2\n - **code example:**\n - **before:**\n ```\n Before 2\n ```\n - **after:**\n ```\n After 2\n ```\n\n" # noqa: E501
expected_output = " **suggestion:** Suggestion 2\n **description:** Description 2\n - **code example:**\n - **before:**\n ```\n Before 2\n ```\n - **after:**\n ```\n After 2\n ```\n\n" # noqa: E501
assert parse_code_suggestion(code_suggestions) == expected_output