Initial commit - PR-Agent OSS release

This commit is contained in:
Ori Kotek
2023-07-06 00:21:08 +03:00
commit 4b4d91dfe9
44 changed files with 2426 additions and 0 deletions

1
.dockerignore Normal file
View File

@ -0,0 +1 @@
venv/

4
.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
.idea/
venv/
pr_agent/settings/.secrets.toml
__pycache__

0
Dockerfile Normal file
View File

202
LICENSE Normal file
View File

@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [2023] [Codium ltd]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

283
README.md Normal file
View File

@ -0,0 +1,283 @@
<div align="center">
# 🛡️ CodiumAI PR-Agent
[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Codium-ai/pr-agent/blob/main/LICENSE)
[![Discord](https://badgen.net/badge/icon/discord?icon=discord&label&color=purple)](https://discord.com/channels/1057273017547378788/1126104260430528613)
CodiumAI `PR-Agent` is an open-source tool that helps developers review PRs faster and more efficiently.
It automatically analyzes the PR, and provides feedback and suggestions, and can answer questions.
It is powered by GPT-4, and is based on the [CodiumAI](https://github.com/Codium-ai/) platform.
</div>
TBD: Add screenshot of the PR Reviewer (could be gif)
* [Quickstart](#Quickstart)
* [Configuration](#Configuration)
* [Usage and Tools](#usage-and-tools)
* [Roadmap](#roadmap)
* [Similar projects](#similar-projects)
* Additional files:
* CONTRIBUTION.md
* LICENSE
*
## Quickstart
To get started with PR-Agent quickly, you first need to acquire two tokens:
1. An OpenAI key from [here](https://platform.openai.com/), with access to GPT-4.
2. A GitHub personal access token (classic) with the repo scope.
There are several ways to use PR-Agent. Let's start with the simplest one:
---
### Method 1: Use Docker image (no installation required)
To request a review for a PR, or ask a question about a PR, you can run the appropriate
Python scripts from the scripts folder. Here's how:
1. To request a review for a PR, run the following command:
```
docker run --rm -it -e OPENAI.KEY=<your key> -e GITHUB.USER_TOKEN=<your token> codiumai/pr-agent \
python pr_agent/scripts/review_pr_from_url.py --pr_url <pr url>
```
---
2. To ask a question about a PR, run the following command:
```
docker run --rm -it -e OPENAI.KEY -e GITHUB.USER_TOKEN codiumai/pr-agent \
python pr_agent/scripts/answer_pr_questions_from_url.py --pr_url <pr url> --question "<your question>"
```
Possible questions you can ask include:
- What is the main theme of this PR?
- Is the PR ready for merge?
- What are the main changes in this PR?
- Should this PR be split into smaller parts?
- Can you compose a rhymed song about this PR.
---
### Method 2: Run from source
1. Clone this repository:
```
git clone https://github.com/Codium-ai/pr-agent.git
```
2. Install the requirements in your favorite virtual environment:
```
pip install -r requirements.txt
```
3. Copy the secrets template file and fill in your OpenAI key and your GitHub user token:
```
cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets
# Edit .secrets file
```
4. Run the appropriate Python scripts from the scripts folder:
```
python pr_agent/scripts/review_pr_from_url.py --pr_url <pr url>
python pr_agent/scripts/answer_pr_questions_from_url.py --pr_url <pr url> --question "<your question>"
```
---
### Method 3: Method 3: Run as a polling server; request reviews by tagging your Github user on a PR
Follow steps 1-3 of method 2.
Run the following command to start the server:
```
python pr_agent/servers/github_polling.py
```
---
### Method 4: Run as a Github App, allowing you to automate the review process on your private or public repositories.
1. Create a GitHub App from the [Github Developer Portal](https://docs.github.com/en/developers/apps/creating-a-github-app).
- Set the following permissions:
- Pull requests: Read & write
- Issue comment: Read & write
- Metadata: Read-only
- Set the following events:
- Issue comment
- Pull request
2. Generate a random secret for your app, and save it for later. For example, you can use:
```
WEBHOOK_SECRET=$(python -c "import secrets; print(secrets.token_hex(10))")
```
3. Acquire the following pieces of information from your app's settings page:
- App private key (click "Generate a private key", and save the file)
- App ID
4. Clone this repository:
```
git clone https://github.com/Codium-ai/pr-agent.git
```
5. Copy the secrets template file and fill in the following:
- Your OpenAI key.
- Set deployment_type to 'app'
- Copy your app's private key to the private_key field.
- Copy your app's ID to the app_id field.
- Copy your app's webhook secret to the webhook_secret field.
```
cp pr_agent/settings/.secrets_template.toml pr_agent/settings/.secrets
# Edit .secrets file
```
6. Build a Docker image for the app and optionally push it to a Docker repository. We'll use Dockerhub as an example:
```
docker build . -t codiumai/pr-agent:github_app --target github_app -f docker/Dockerfile
docker push codiumai/pr-agent:github_app # Push to your Docker repository
```
7. Host the app using a server, serverless function, or container environment. Alternatively, for development and
debugging, you may use tools like smee.io to forward webhooks to your local machine.
8. Go back to your app's settings, set the following:
- Webhook URL: The URL of your app's server, or the URL of the smee.io channel.
- Webhook secret: The secret you generated earlier.
9. Install the app by navigating to the "Install App" tab, and selecting your desired repositories.
---
## Usage and Tools
CodiumAI PR-Agent provides two types of interactions ("tools"): `"PR Reviewer"` and `"PR Q&A"`.
- The "PR Reviewer" tool automatically analyzes PRs, and provides different types of feedbacks.
- The "PR Q&A" tool answers free-text questions about the PR.
### PR Reviewer
Here is a quick overview of the different sub-tools of PR Reviewer:
- PR Analysis
- Summarize main theme
- PR description and title
- PR type classification
- Is the PR covered by relevant tests
- Is the PR minimal and focused
- PR Feedback
- General PR suggestions
- Code suggestions
- Security concerns
This is how a typical output of the PR Reviewer looks like:
---
#### PR Analysis
- 🎯 **Main theme:** Adding language extension handler and token handler
- 🔍 **Description and title:** Yes
- 📌 **Type of PR:** Enhancement
- 🧪 **Relevant tests added:** No
-**Minimal and focused:** Yes, the PR is focused on adding two new handlers for language extension and token counting.
#### PR Feedback
- 💡 **General PR suggestions:** The PR is generally well-structured and the code is clean. However, it would be beneficial to add some tests to ensure the new handlers work as expected. Also, consider adding docstrings to the new functions and classes to improve code readability and maintainability.
- 🤖 **Code suggestions:**
- **suggestion 1:**
- **relevant file:** pr_agent/algo/language_handler.py
- **suggestion content:** Consider using a set instead of a list for 'bad_extensions' as checking membership in a set is faster than in a list. [medium]
- **suggestion 2:**
- **relevant file:** pr_agent/algo/language_handler.py
- **suggestion content:** In the 'filter_bad_extensions' function, you are splitting the filename on '.' and taking the last element to get the extension. This might not work as expected if the filename contains multiple '.' characters. Consider using 'os.path.splitext' to get the file extension more reliably. [important]
- 🔒 **Security concerns:** No, the PR does not introduce possible security concerns or issues.
---
### PR Q&A
This tool answers free-text questions about the PR. This is how a typical output of the PR Q&A looks like:
---
**Question**: summarize for me the PR in 4 bullet points
**Answer**:
- The PR introduces a new feature to sort files by their main languages. It uses a mapping of programming languages to their file extensions to achieve this.
- It also introduces a filter to exclude files with certain extensions, deemed as 'bad extensions', from the sorting process.
- The PR modifies the `get_pr_diff` function in `pr_processing.py` to use the new sorting function. It also refactors the code to move the PR pruning logic into a separate function.
- A new `TokenHandler` class is introduced in `token_handler.py` to handle token counting operations. This class is initialized with a PR, variables, system, and user, and provides methods to get system and user tokens and to count tokens in a patch.
---
## Configuration
The different tools and sub-tools used by CodiumAI PR-Agent are easily configurable via the configuration file: `/settings/configuration.toml`.
#### Enabling/disabling sub-tools:
You can enable/disable the different PR Reviewer sub-sections with the following flags:
```
require_minimal_and_focused_review=true
require_tests_review=true
require_security_review=true
```
#### Code Suggestions configuration:
There are also configuration options to control different aspects of the `code suggestions` feature.
The number of suggestions provided can be controlled by adjusting the following parameter:
```
num_code_suggestions=4
```
You can also enable more verbose and informative mode of code suggestions:
```
extended_code_suggestions=false
```
This is a comparison of the regular and extended code suggestions modes:
---
Example for regular suggestion:
- **suggestion 1:**
- **relevant file:** sql.py
- **suggestion content:** Remove hardcoded sensitive information like username and password. Use environment variables or a secure method to store these values. [important]
---
Example for extended suggestion:
- **suggestion 1:**
- **relevant file:** sql.py
- **suggestion content:** Remove hardcoded sensitive information (username and password) [important]
- **why:** Hardcoding sensitive information is a security risk. It's better to use environment variables or a secure way to store these values.
- **code example:**
- **before code:**
```
user = "root",
password = "Mysql@123",
```
- **after code:**
```
user = os.getenv('DB_USER'),
password = os.getenv('DB_PASSWORD'),
```
---
## Roadmap
- [ ] Support open-source models, as a replacement for openai models. Note that a minimal requirement for each open-source model is to have 8k+ context, and good support for generating json as an output
- [ ] Support other Git providers, such as Gitlab and Bitbucket.
- [ ] Develop additional logics for handling large PRs, and compressing git patches
- [ ] Dedicated tools and sub-tools for specific programming languages (Python, Javascript, Java, C++, etc)
- [ ] Add additional context to the prompt. For example, repo (or relevant files) summarization, with tools such a [ctags](https://github.com/universal-ctags/ctags)
- [ ] Adding more tools. Possible directions:
- [ ] Code Quality
- [ ] Coding Style
- [ ] Performance (are there any performance issues)
- [ ] Documentation (is the PR properly documented)
- [ ] Rank the PR importance
- [ ] ...
## Similar Projects
- [CodiumAI - Meaningful tests for busy devs](https://github.com/Codium-ai/codiumai-vscode-release)
- [Aider - GPT powered coding in your terminal](https://github.com/paul-gauthier/aider)
- [GPT-Engineer](https://github.com/AntonOsika/gpt-engineer)
- [CodeReview BOT](https://github.com/anc95/ChatGPT-CodeReview)

20
docker/Dockerfile Normal file
View File

@ -0,0 +1,20 @@
FROM python:3.10 as base
WORKDIR /app
ADD requirements.txt .
RUN pip install -r requirements.txt && rm requirements.txt
ENV PYTHONPATH=/app
ADD pr_agent pr_agent
FROM base as github_app
CMD ["python", "servers/github_app.py"]
FROM base as github_polling
CMD ["python", "servers/github_polling.py"]
FROM base as test
ADD requirements-dev.txt .
RUN pip install -r requirements-dev.txt && rm requirements-dev.txt
FROM base as cli
CMD ["bash"]

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

BIN
pics/pr_questions.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

BIN
pics/pr_reviewer.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 267 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

1
pr_agent/__init__.py Normal file
View File

@ -0,0 +1 @@

View File

View File

@ -0,0 +1,20 @@
import re
from typing import Optional
from pr_agent.tools.pr_questions import PRQuestions
from pr_agent.tools.pr_reviewer import PRReviewer
class PRAgent:
def __init__(self, installation_id: Optional[int] = None):
self.installation_id = installation_id
async def handle_request(self, pr_url, request):
if 'please review' in request.lower():
reviewer = PRReviewer(pr_url, self.installation_id)
await reviewer.review()
elif 'please answer' in request.lower():
question = re.split(r'(?i)please answer', request)[1].strip()
answerer = PRQuestions(pr_url, question, self.installation_id)
await answerer.answer()

10
pr_agent/algo/__init__.py Normal file
View File

@ -0,0 +1,10 @@
MAX_TOKENS = {
'gpt-3.5-turbo': 4000,
'gpt-3.5-turbo-0613': 4000,
'gpt-3.5-turbo-0301': 4000,
'gpt-3.5-turbo-16k': 16000,
'gpt-3.5-turbo-16k-0613': 16000,
'gpt-4': 8000,
'gpt-4-0613': 8000,
'gpt-4-32k': 32000,
}

View File

@ -0,0 +1,37 @@
import logging
import openai
from openai.error import APIError, Timeout, TryAgain
from retry import retry
from pr_agent.config_loader import settings
OPENAI_RETRIES=2
class AiHandler:
def __init__(self):
try:
openai.api_key = settings.openai.key
except AttributeError as e:
raise ValueError("OpenAI key is required") from e
@retry(exceptions=(APIError, Timeout, TryAgain, AttributeError),
tries=OPENAI_RETRIES, delay=2, backoff=2, jitter=(1, 3))
async def chat_completion(self, model: str, temperature: float, system: str, user: str):
try:
response = await openai.ChatCompletion.acreate(
model=model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user}
],
temperature=temperature,
)
except (APIError, Timeout, TryAgain) as e:
logging.error("Error during OpenAI inference: ", e)
raise
if response is None or len(response.choices) == 0:
raise TryAgain
resp = response.choices[0]['message']['content']
finish_reason = response.choices[0].finish_reason
return resp, finish_reason

View File

@ -0,0 +1,107 @@
from __future__ import annotations
import logging
import re
from pr_agent.config_loader import settings
def extend_patch(original_file_str, patch_str, num_lines) -> str:
"""
Extends the patch to include 'num_lines' more surrounding lines
"""
if not patch_str or num_lines == 0:
return patch_str
original_lines = original_file_str.splitlines()
patch_lines = patch_str.splitlines()
extended_patch_lines = []
start1, size1, start2, size2 = -1, -1, -1, -1
RE_HUNK_HEADER = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
try:
for line in patch_lines:
if line.startswith('@@'):
match = RE_HUNK_HEADER.match(line)
if match:
# finish previous hunk
if start1 != -1:
extended_patch_lines.extend(
original_lines[start1 + size1 - 1:start1 + size1 - 1 + num_lines])
start1, size1, start2, size2 = map(int, match.groups()[:4])
section_header = match.groups()[4]
extended_start1 = max(1, start1 - num_lines)
extended_size1 = size1 + (start1 - extended_start1) + num_lines
extended_start2 = max(1, start2 - num_lines)
extended_size2 = size2 + (start2 - extended_start2) + num_lines
extended_patch_lines.append(
f'@@ -{extended_start1},{extended_size1} '
f'+{extended_start2},{extended_size2} @@ {section_header}')
extended_patch_lines.extend(
original_lines[extended_start1 - 1:start1 - 1]) # one to zero based
continue
extended_patch_lines.append(line)
except Exception as e:
if settings.config.verbosity_level >= 2:
logging.error(f"Failed to extend patch: {e}")
return patch_str
# finish previous hunk
if start1 != -1:
extended_patch_lines.extend(
original_lines[start1 + size1 - 1:start1 + size1 - 1 + num_lines])
extended_patch_str = '\n'.join(extended_patch_lines)
return extended_patch_str
def omit_deletion_hunks(patch_lines) -> str:
temp_hunk = []
added_patched = []
add_hunk = False
inside_hunk = False
RE_HUNK_HEADER = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))?\ @@[ ]?(.*)")
for line in patch_lines:
if line.startswith('@@'):
match = RE_HUNK_HEADER.match(line)
if match:
# finish previous hunk
if inside_hunk and add_hunk:
added_patched.extend(temp_hunk)
temp_hunk = []
add_hunk = False
temp_hunk.append(line)
inside_hunk = True
else:
temp_hunk.append(line)
edit_type = line[0]
if edit_type == '+':
add_hunk = True
if inside_hunk and add_hunk:
added_patched.extend(temp_hunk)
return '\n'.join(added_patched)
def handle_patch_deletions(patch: str, original_file_content_str: str,
new_file_content_str: str, file_name: str) -> str:
"""
Handle entire file or deletion patches
"""
if not new_file_content_str:
# logic for handling deleted files - don't show patch, just show that the file was deleted
if settings.config.verbosity_level > 0:
logging.info(f"Processing file: {file_name}, minimizing deletion file")
patch = "File was deleted\n"
else:
patch_lines = patch.splitlines()
patch_new = omit_deletion_hunks(patch_lines)
if patch != patch_new:
if settings.config.verbosity_level > 0:
logging.info(f"Processing file: {file_name}, hunks were deleted")
patch = patch_new
return patch

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,128 @@
from __future__ import annotations
import difflib
import logging
from typing import Any, Dict, Tuple
from pr_agent.algo.git_patch_processing import extend_patch, handle_patch_deletions
from pr_agent.algo.language_handler import sort_files_by_main_languages
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.config_loader import settings
from pr_agent.git_providers import GithubProvider
OUTPUT_BUFFER_TOKENS = 800
PATCH_EXTRA_LINES = 3
def get_pr_diff(git_provider: [GithubProvider, Any], token_handler: TokenHandler) -> str:
"""
Returns a string with the diff of the PR.
If needed, apply diff minimization techniques to reduce the number of tokens
"""
files = list(git_provider.get_diff_files())
# get pr languages
pr_languages = sort_files_by_main_languages(git_provider.get_languages(), files)
# generate a standard diff string, with patch extension
patches_extended, total_tokens = pr_generate_extended_diff(pr_languages, token_handler)
# if we are under the limit, return the full diff
if total_tokens + OUTPUT_BUFFER_TOKENS < token_handler.limit:
return "\n".join(patches_extended)
# if we are over the limit, start pruning
patches_compressed = pr_generate_compressed_diff(pr_languages, token_handler)
return "\n".join(patches_compressed)
def pr_generate_extended_diff(pr_languages: list, token_handler: TokenHandler) -> \
Tuple[list, int]:
"""
Generate a standard diff string, with patch extension
"""
total_tokens = token_handler.prompt_tokens # initial tokens
patches_extended = []
for lang in pr_languages:
for file in lang['files']:
original_file_content_str = file.base_file
new_file_content_str = file.head_file
patch = file.patch
# handle the case of large patch, that initially was not loaded
patch = load_large_diff(file, new_file_content_str, original_file_content_str, patch)
if not patch:
continue
# extend each patch with extra lines of context
extended_patch = extend_patch(original_file_content_str, patch, num_lines=PATCH_EXTRA_LINES)
full_extended_patch = f"## {file.filename}\n\n{extended_patch}\n"
patch_tokens = token_handler.count_tokens(full_extended_patch)
file.tokens = patch_tokens
total_tokens += patch_tokens
patches_extended.append(full_extended_patch)
return patches_extended, total_tokens
def pr_generate_compressed_diff(top_langs: list, token_handler: TokenHandler) -> list:
# Apply Diff Minimization techniques to reduce the number of tokens:
# 0. Start from the largest diff patch to smaller ones
# 1. Don't use extend context lines around diff
# 2. Minimize deleted files
# 3. Minimize deleted hunks
# 4. Minimize all remaining files when you reach token limit
patches = []
# sort each one of the languages in top_langs by the number of tokens in the diff
sorted_files = []
for lang in top_langs:
sorted_files.extend(sorted(lang['files'], key=lambda x: x.tokens, reverse=True))
total_tokens = token_handler.prompt_tokens
for file in sorted_files:
original_file_content_str = file.base_file
new_file_content_str = file.head_file
patch = file.patch
patch = load_large_diff(file, new_file_content_str, original_file_content_str, patch)
if not patch:
continue
# removing delete-only hunks
patch = handle_patch_deletions(patch, original_file_content_str,
new_file_content_str, file.filename)
new_patch_tokens = token_handler.count_tokens(patch)
if total_tokens > token_handler.limit - OUTPUT_BUFFER_TOKENS // 2:
logging.warning(f"File was fully skipped, no more tokens: {file.filename}.")
continue # Hard Stop, no more tokens
if total_tokens + new_patch_tokens > token_handler.limit - OUTPUT_BUFFER_TOKENS:
# Current logic is to skip the patch if it's too large
# TODO: Option for alternative logic to remove hunks from the patch to reduce the number of tokens
# until we meet the requirements
if settings.config.verbosity_level >= 2:
logging.warning(f"Patch too large, minimizing it, {file.filename}")
patch = "File was modified"
if patch:
patch_final = f"## {file.filename}\n\n{patch}\n"
patches.append(patch_final)
total_tokens += token_handler.count_tokens(patch_final)
if settings.config.verbosity_level >= 2:
logging.info(f"Tokens: {total_tokens}, last filename: {file.filename}")
return patches
def load_large_diff(file, new_file_content_str: str, original_file_content_str: str, patch: str) -> str:
if not patch: # to Do - also add condition for file extension
try:
diff = difflib.unified_diff(original_file_content_str.splitlines(keepends=True),
new_file_content_str.splitlines(keepends=True))
if settings.config.verbosity_level >= 2:
logging.warning(f"File was modified, but no patch was found. Manually creating patch: {file.filename}.")
patch = ''.join(diff)
except Exception:
pass
return patch

View File

@ -0,0 +1,24 @@
from jinja2 import Environment, StrictUndefined
from tiktoken import encoding_for_model
from pr_agent.algo import MAX_TOKENS
from pr_agent.config_loader import settings
class TokenHandler:
def __init__(self, pr, vars: dict, system, user):
self.encoder = encoding_for_model(settings.config.model)
self.limit = MAX_TOKENS[settings.config.model]
self.prompt_tokens = self._get_system_user_tokens(pr, self.encoder, vars, system, user)
def _get_system_user_tokens(self, pr, encoder, vars: dict, system, user):
environment = Environment(undefined=StrictUndefined)
system_prompt = environment.from_string(system).render(vars)
user_prompt = environment.from_string(user).render(vars)
system_prompt_tokens = len(encoder.encode(system_prompt))
user_prompt_tokens = len(encoder.encode(user_prompt))
return system_prompt_tokens + user_prompt_tokens
def count_tokens(self, patch: str) -> int:
return len(self.encoder.encode(patch))

59
pr_agent/algo/utils.py Normal file
View File

@ -0,0 +1,59 @@
from __future__ import annotations
import textwrap
def convert_to_markdown(output_data: dict) -> str:
markdown_text = ""
emojis = {
"Main theme": "🎯",
"Description and title": "🔍",
"Type of PR": "📌",
"Relevant tests added": "🧪",
"Unrelated changes": "⚠️",
"Minimal and focused": "",
"Security concerns": "🔒",
"General PR suggestions": "💡",
"Code suggestions": "🤖"
}
for key, value in output_data.items():
if not value:
continue
if isinstance(value, dict):
markdown_text += f"## {key}\n\n"
markdown_text += convert_to_markdown(value)
elif isinstance(value, list):
if key.lower() == 'code suggestions':
markdown_text += "\n" # just looks nicer with additional line breaks
emoji = emojis.get(key, "") # Use a dash if no emoji is found for the key
markdown_text += f"- {emoji} **{key}:**\n\n"
for item in value:
if isinstance(item, dict) and key.lower() == 'code suggestions':
markdown_text += parse_code_suggestion(item)
elif item:
markdown_text += f" - {item}\n"
elif value != 'n/a':
emoji = emojis.get(key, "") # Use a dash if no emoji is found for the key
markdown_text += f"- {emoji} **{key}:** {value}\n"
return markdown_text
def parse_code_suggestion(code_suggestions: dict) -> str:
markdown_text = ""
for sub_key, sub_value in code_suggestions.items():
if isinstance(sub_value, dict): # "code example"
markdown_text += f" - **{sub_key}:**\n"
for code_key, code_value in sub_value.items(): # 'before' and 'after' code
code_str = f"```\n{code_value}\n```"
code_str_indented = textwrap.indent(code_str, ' ')
markdown_text += f" - **{code_key}:**\n{code_str_indented}\n"
else:
if "suggestion number" in sub_key.lower():
markdown_text += f"- **suggestion {sub_value}:**\n" # prettier formatting
else:
markdown_text += f" - **{sub_key}:** {sub_value}\n"
markdown_text += "\n"
return markdown_text

14
pr_agent/config_loader.py Normal file
View File

@ -0,0 +1,14 @@
from os.path import abspath, dirname, join
from dynaconf import Dynaconf
current_dir = dirname(abspath(__file__))
settings = Dynaconf(
envvar_prefix=False,
settings_files=[join(current_dir, f) for f in [
"settings/.secrets.toml",
"settings/configuration.toml",
"settings/pr_reviewer_prompts.toml",
"settings/pr_questions_prompts.toml"
]]
)

View File

@ -0,0 +1,15 @@
from pr_agent.config_loader import settings
from pr_agent.git_providers.github_provider import GithubProvider
_GIT_PROVIDERS = {
'github': GithubProvider
}
def get_git_provider():
try:
provider_id = settings.config.git_provider
except AttributeError as e:
raise ValueError("github_provider is a required attribute in the configuration file") from e
if provider_id not in _GIT_PROVIDERS:
raise ValueError(f"Unknown git provider: {provider_id}")
return _GIT_PROVIDERS[provider_id]

View File

@ -0,0 +1,170 @@
from collections import namedtuple
from dataclasses import dataclass
from datetime import datetime
from typing import Optional, Tuple
from urllib.parse import urlparse
from github import AppAuthentication, File, Github
from pr_agent.config_loader import settings
@dataclass
class FilePatchInfo:
base_file: str
head_file: str
patch: str
filename: str
tokens: int = -1
class GithubProvider:
def __init__(self, pr_url: Optional[str] = None, installation_id: Optional[int] = None):
self.installation_id = installation_id
self.github_client = self._get_github_client()
self.repo = None
self.pr_num = None
self.pr = None
if pr_url:
self.set_pr(pr_url)
def set_pr(self, pr_url: str):
self.repo, self.pr_num = self._parse_pr_url(pr_url)
self.pr = self._get_pr()
def get_diff_files(self) -> list[FilePatchInfo]:
files = self.pr.get_files()
diff_files = []
for file in files:
original_file_content_str = self._get_pr_file_content(file, self.pr.base.sha)
new_file_content_str = self._get_pr_file_content(file, self.pr.head.sha)
diff_files.append(FilePatchInfo(original_file_content_str, new_file_content_str, file.patch, file.filename))
return diff_files
def publish_comment(self, pr_comment: str):
self.pr.create_issue_comment(pr_comment)
def get_title(self):
return self.pr.title
def get_description(self):
return self.pr.body
def get_languages(self):
return self._get_repo().get_languages()
def get_main_pr_language(self) -> str:
"""
Get the main language of the commit. Return an empty string if cannot determine.
"""
main_language_str = ""
try:
languages = self.get_languages()
top_language = max(languages, key=languages.get).lower()
# validate that the specific commit uses the main language
extension_list = []
files = self.pr.get_files()
for file in files:
extension_list.append(file.filename.rsplit('.')[-1])
# get the most common extension
most_common_extension = max(set(extension_list), key=extension_list.count)
# look for a match. TBD: add more languages, do this systematically
if most_common_extension == 'py' and top_language == 'python' or \
most_common_extension == 'js' and top_language == 'javascript' or \
most_common_extension == 'ts' and top_language == 'typescript' or \
most_common_extension == 'go' and top_language == 'go' or \
most_common_extension == 'java' and top_language == 'java' or \
most_common_extension == 'c' and top_language == 'c' or \
most_common_extension == 'cpp' and top_language == 'c++' or \
most_common_extension == 'cs' and top_language == 'c#' or \
most_common_extension == 'swift' and top_language == 'swift' or \
most_common_extension == 'php' and top_language == 'php' or \
most_common_extension == 'rb' and top_language == 'ruby' or \
most_common_extension == 'rs' and top_language == 'rust' or \
most_common_extension == 'scala' and top_language == 'scala' or \
most_common_extension == 'kt' and top_language == 'kotlin' or \
most_common_extension == 'pl' and top_language == 'perl' or \
most_common_extension == 'swift' and top_language == 'swift':
main_language_str = top_language
except Exception:
pass
return main_language_str
def get_pr_branch(self):
return self.pr.head.ref
def get_notifications(self, since: datetime):
deployment_type = settings.get("GITHUB.DEPLOYMENT_TYPE", "user")
if deployment_type != 'user':
raise ValueError("Deployment mode must be set to 'user' to get notifications")
notifications = self.github_client.get_user().get_notifications(since=since)
return notifications
@staticmethod
def _parse_pr_url(pr_url: str) -> Tuple[str, int]:
parsed_url = urlparse(pr_url)
if 'github.com' not in parsed_url.netloc:
raise ValueError("The provided URL is not a valid GitHub URL")
path_parts = parsed_url.path.strip('/').split('/')
if 'api.github.com' in parsed_url.netloc:
if len(path_parts) < 5 or path_parts[3] != 'pulls':
raise ValueError("The provided URL does not appear to be a GitHub PR URL")
repo_name = '/'.join(path_parts[1:3])
try:
pr_number = int(path_parts[4])
except ValueError as e:
raise ValueError("Unable to convert PR number to integer") from e
return repo_name, pr_number
if len(path_parts) < 4 or path_parts[2] != 'pull':
raise ValueError("The provided URL does not appear to be a GitHub PR URL")
repo_name = '/'.join(path_parts[:2])
try:
pr_number = int(path_parts[3])
except ValueError as e:
raise ValueError("Unable to convert PR number to integer") from e
return repo_name, pr_number
def _get_github_client(self):
deployment_type = settings.get("GITHUB.DEPLOYMENT_TYPE", "user")
if deployment_type == 'app':
try:
private_key = settings.github.private_key
app_id = settings.github.app_id
except AttributeError as e:
raise ValueError("GitHub app ID and private key are required when using GitHub app deployment") from e
if not self.installation_id:
raise ValueError("GitHub app installation ID is required when using GitHub app deployment")
auth = AppAuthentication(app_id=app_id, private_key=private_key,
installation_id=self.installation_id)
return Github(app_auth=auth)
if deployment_type == 'user':
try:
token = settings.github.user_token
except AttributeError as e:
raise ValueError("GitHub token is required when using user deployment") from e
return Github(token)
def _get_repo(self):
return self.github_client.get_repo(self.repo)
def _get_pr(self):
return self._get_repo().get_pull(self.pr_num)
def _get_pr_file_content(self, file: FilePatchInfo, sha: str):
try:
file_content_str = self._get_repo().get_contents(file.filename, ref=sha).decoded_content.decode()
except Exception:
file_content_str = ""
return file_content_str

View File

@ -0,0 +1,16 @@
import argparse
import asyncio
import logging
import os
from pr_agent.tools.pr_questions import PRQuestions
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Review a PR from a URL')
parser.add_argument('--pr_url', type=str, help='The URL of the PR to review', required=True)
parser.add_argument('--question_str', type=str, help='The question to answer', required=True)
args = parser.parse_args()
logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))
reviewer = PRQuestions(args.pr_url, args.question_str, None)
asyncio.run(reviewer.answer())

View File

@ -0,0 +1,14 @@
import argparse
import asyncio
import logging
import os
from pr_agent.tools.pr_reviewer import PRReviewer
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Review a PR from a URL')
parser.add_argument('--pr_url', type=str, help='The URL of the PR to review', required=True)
args = parser.parse_args()
logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"))
reviewer = PRReviewer(args.pr_url, None)
asyncio.run(reviewer.review())

View File

@ -0,0 +1,78 @@
import logging
import sys
import uvicorn
from fastapi import APIRouter, FastAPI, HTTPException, Request, Response
from pr_agent.agent.pr_agent import PRAgent
from pr_agent.config_loader import settings
from pr_agent.servers.utils import verify_signature
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
router = APIRouter()
@router.post("/api/v1/github_webhooks")
async def handle_github_webhooks(request: Request, response: Response):
logging.debug("Received a github webhook")
try:
body = await request.json()
except Exception as e:
logging.error("Error parsing request body", e)
raise HTTPException(status_code=400, detail="Error parsing request body") from e
body_bytes = await request.body()
signature_header = request.headers.get('x-hub-signature-256', None)
try:
webhook_secret = settings.github.webhook_secret
except AttributeError:
webhook_secret = None
if webhook_secret:
verify_signature(body_bytes, webhook_secret, signature_header)
logging.debug(f'Request body:\n{body}')
return await handle_request(body)
async def handle_request(body):
action = body.get("action", None)
installation_id = body.get("installation", {}).get("id", None)
agent = PRAgent(installation_id)
if action == 'created':
if "comment" not in body:
return {}
comment_body = body.get("comment", {}).get("body", None)
if "says 'Please" in comment_body:
return {}
if "issue" not in body and "pull_request" not in body["issue"]:
return {}
pull_request = body["issue"]["pull_request"]
api_url = pull_request.get("url", None)
await agent.handle_request(api_url, comment_body)
elif action in ["opened"] or 'reopened' in action:
pull_request = body.get("pull_request", None)
if not pull_request:
return {}
api_url = pull_request.get("url", None)
if api_url is None:
return {}
await agent.handle_request(api_url, "please review")
else:
return {}
@router.get("/")
async def root():
return {"status": "ok"}
def start():
if settings.get("GITHUB.DEPLOYMENT_TYPE", "user") != "app":
raise Exception("Please set deployment type to app in .secrets.toml file")
app = FastAPI()
app.include_router(router)
uvicorn.run(app, host="0.0.0.0", port=3000)
if __name__ == '__main__':
start()

View File

@ -0,0 +1,73 @@
import asyncio
import logging
import sys
from datetime import datetime, timezone
import aiohttp
from pr_agent.agent.pr_agent import PRAgent
from pr_agent.config_loader import settings
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
NOTIFICATION_URL = "https://api.github.com/notifications"
def now() -> str:
now_utc = datetime.now(timezone.utc).isoformat()
now_utc = now_utc.replace("+00:00", "Z")
return now_utc
async def polling_loop():
since = [now()]
last_modified = [None]
try:
deployment_type = settings.github.deployment_type
token = settings.github.user_token
except AttributeError:
deployment_type = 'none'
token = None
if deployment_type != 'user':
raise ValueError("Deployment mode must be set to 'user' to get notifications")
if not token:
raise ValueError("User token must be set to get notifications")
async with aiohttp.ClientSession() as session:
while True:
headers = {
"Accept": "application/vnd.github.v3+json",
"Authorization": f"Bearer {token}"
}
params = {
"participating": "true"
}
if since[0]:
params["since"] = since[0]
if last_modified[0]:
headers["If-Modified-Since"] = last_modified[0]
async with session.get(NOTIFICATION_URL, headers=headers, params=params) as response:
if response.status == 200:
if 'Last-Modified' in response.headers:
last_modified[0] = response.headers['Last-Modified']
since[0] = None
notifications = await response.json()
for notification in notifications:
if 'reason' in notification and notification['reason'] == 'mention':
if 'subject' in notification and notification['subject']['type'] == 'PullRequest':
pr_url = notification['subject']['url']
latest_comment = notification['subject']['latest_comment_url']
async with session.get(latest_comment, headers=headers) as comment_response:
if comment_response.status == 200:
comment = await comment_response.json()
comment_body = comment['body'] if 'body' in comment else ''
commenter_github_user = comment['user']['login'] if 'user' in comment else ''
logging.info(f"Commenter: {commenter_github_user}\nComment: {comment_body}")
if comment_body.strip().startswith("@"):
agent = PRAgent()
await agent.handle_request(pr_url, comment_body)
elif response.status != 304:
print(f"Failed to fetch notifications. Status code: {response.status}")
await asyncio.sleep(5)
if __name__ == '__main__':
asyncio.run(polling_loop())

23
pr_agent/servers/utils.py Normal file
View File

@ -0,0 +1,23 @@
import hashlib
import hmac
from fastapi import HTTPException
def verify_signature(payload_body, secret_token, signature_header):
"""Verify that the payload was sent from GitHub by validating SHA256.
Raise and return 403 if not authorized.
Args:
payload_body: original request body to verify (request.body())
secret_token: GitHub app webhook token (WEBHOOK_SECRET)
signature_header: header received from GitHub (x-hub-signature-256)
"""
if not signature_header:
raise HTTPException(status_code=403, detail="x-hub-signature-256 header is missing!")
hash_object = hmac.new(secret_token.encode('utf-8'), msg=payload_body, digestmod=hashlib.sha256)
expected_signature = "sha256=" + hash_object.hexdigest()
if not hmac.compare_digest(expected_signature, signature_header):
raise HTTPException(status_code=403, detail="Request signatures didn't match!")

View File

@ -0,0 +1,26 @@
# QUICKSTART:
# Copy this file to .secrets in the same folder.
# The minimum workable settings - set openai.key to your API key.
# Set github.deployment_type to "user" and github.user_token to your GitHub personal access token.
# This will allow you to run the CLI scripts in the scripts/ folder and the github_polling server.
#
# See README for details about GitHub App deployment.
[openai]
key = "<API_KEY>"
[github]
# The type of deployment to create. Valid values are 'app' or 'user'.
deployment_type = "user"
# ---- Set the following only for deployment type == "user"
user_token = "<TOKEN>" # A GitHub personal access token with 'repo' scope.
# ---- Set the following only for deployment type == "app", see README for details.
private_key = """\
-----BEGIN RSA PRIVATE KEY-----
<GITHUB PRIVATE KEY>
-----END RSA PRIVATE KEY-----
"""
app_id = 123456 # The GitHub App ID, replace with your own.
webhook_secret = "<WEBHOOK SECRET>" # Optional, may be commented out.

View File

@ -0,0 +1,15 @@
[config]
model="gpt-4-0613"
git_provider="github"
publish_review=true
verbosity_level=0 # 0,1,2
[pr_reviewer]
require_minimal_and_focused_review=true
require_tests_review=true
require_security_review=true
extended_code_suggestions=false
num_code_suggestions=4
[pr_questions]

View File

@ -0,0 +1,30 @@
[pr_questions_prompt]
system="""You are CodiumAI-PR-Reviewer, a language model designed to review git pull requests.
Your task is to answer questions about the new PR code (the '+' lines), and provide feedback.
Be informative, constructive, and give examples. Try to be as specific as possible, and don't avoid answering the questions.
Make sure not to repeat modifications already implemented in the new PR code (the '+' lines).
"""
user="""PR Info:
Title: '{{title}}'
Branch: '{{branch}}'
Description: '{{description}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
The PR Git Diff:
```
{{diff}}
```
Note that lines in the diff body are prefixed with a symbol that represents the type of change: '-' for deletions, '+' for additions, and ' ' (a space) for unchanged lines
The PR Questions:
```
{{ questions }}
```
Response:
"""

View File

@ -0,0 +1,159 @@
[pr_review_prompt]
system="""You are CodiumAI-PR-Reviewer, a language model designed to review git pull requests.
Your task is to provide constructive and concise feedback for the PR, and also provide meaningfull code suggestions to improve the new PR code (the '+' lines).
- Provide up to {{ num_code_suggestions }} code suggestions.
- Try to focus on important suggestions like fixing code problems, issues and bugs. As a second priority, provide suggestions for meaningfull code improvements, like performance, vulnerability, modularity, and best practices.
{%- if extended_code_suggestions %}
- For each suggestion, provide a short and concise code snippet to illustrate the existing code, and the improved code.
{%- endif %}
- Make sure not to provide suggestion repeating modifications already implemented in the new PR code (the '+' lines).
You must use the following JSON schema to format your answer:
```json
{
"PR Analysis": {
"Main theme": {
"type": "string",
"description": "a short explanation of the PR"
},
"Description and title": {
"type": "string",
"description": "yes\\no question: does this PR have a relevant description and title"
},
"Type of PR": {
"type": "string",
"enum": ["Bug fix", "Tests", "Bug fix with tests", "Refactoring", "Enhancement", "Documentation", "Other"]
},
{%- if require_tests %}
"Relevant tests added": {
"type": "string",
"description": "yes\\no question: does this PR have relevant tests ?"
},
{%- endif %}
{%- if require_minimal_and_focused %}
"Minimal and focused": {
"type": "string",
"description": "is this PR as minimal and focused as possible, with all code changes centered around a single coherent theme, described in the PR description and title ?" explain your answer"
}
},
{%- endif %}
"PR Feedback": {
"General PR suggestions": {
"type": "string",
"description": "important suggestions for the contributors and maintainers of this PR, may include overall structure, primary purpose and best practices. consider using specific filenames, classes and functions names. explain yourself!"
},
"Code suggestions": {
"type": "array",
"maxItems": {{ num_code_suggestions }},
"uniqueItems": true,
"items": {
"suggestion number": {
"type": "int",
"description": "suggestion number, starting from 1"
},
"relevant file": {
"type": "string",
"description": "the relevant file name"
},
"suggestion content": {
"type": "string",
{%- if extended_code_suggestions %}
"description": "a concrete suggestion for meaningfully improving the new PR code. Don't repeat previous suggestions. Add tags with importance measure that matches each suggestion ('important' or 'medium'). Do not make suggestions for updating or adding docstrings, renaming PR title and description, or linter like.
{%- else %}
"description": "a concrete suggestion for meaningfully improving the new PR code. Also describe how, specifically, the suggestion can be applied to new PR code. Add tags with importance measure that matches each suggestion ('important' or 'medium'). Do not make suggestions for updating or adding docstrings, renaming PR title and description, or linter like.
{%- endif %}
},
{%- if extended_code_suggestions %}
"why": {
"type": "string",
"description": "shortly explain why this suggestion is important"
},
"code example": {
"type": "object",
"properties": {
"before code": {
"type": "string",
"description": "Short and concise code snippet, to illustrate the existing code"
},
"after code": {
"type": "string",
"description": "Short and concise code snippet, to illustrate the improved code"
}
}
}
{%- endif %}
}
},
{%- if require_security %}
"Security concerns": {
"type": "string",
"description": "yes\\no question: does this PR code introduce possible security concerns or issues, like SQL injection, XSS, CSRF, and others ? explain your answer"
? explain your answer"
}
{%- endif %}
}
}
```
Example output:
'
{
"PR Analysis":
{
"Main theme": "xxx",
"Description and title": "Yes",
"Type of PR": "Bug fix",
{%- if require_tests %}
"Relevant tests added": "No",
{%- endif %}
{%- if require_minimal_and_focused %}
"Minimal and focused": "No, because ..."
{%- endif %}
},
"PR Feedback":
{
"General PR suggestions": "..., `xxx`...",
"Code suggestions": [
{
"suggestion number": 1,
"relevant file": "xxx.py",
"suggestion content": "xxx [important]",
{%- if extended_code_suggestions %}
"why": "xxx",
"code example":
{
"before code": "xxx",
"after code": "xxx"
}
{%- endif %}
},
...
]
{%- if require_security %},
"Security concerns": "No, because ..."
{%- endif %}
}
}
'
Don't repeat the prompt in the answer, and avoid outputting the 'type' and 'description' fields.
"""
user="""PR Info:
Title: '{{title}}'
Branch: '{{branch}}'
Description: '{{description}}'
{%- if language %}
Main language: {{language}}
{%- endif %}
The PR Git Diff:
```
{{diff}}
```
Note that lines in the diff body are prefixed with a symbol that represents the type of change: '-' for deletions, '+' for additions, and ' ' (a space) for unchanged lines.
Response (should be a valid JSON, and nothing else):
```json
"""

View File

View File

@ -0,0 +1,67 @@
import copy
import logging
from typing import Optional
from jinja2 import Environment, StrictUndefined
from pr_agent.algo.ai_handler import AiHandler
from pr_agent.algo.pr_processing import get_pr_diff
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider
class PRQuestions:
def __init__(self, pr_url: str, question_str: str, installation_id: Optional[int] = None):
self.git_provider = get_git_provider()(pr_url, installation_id)
self.main_pr_language = self.git_provider.get_main_pr_language()
self.installation_id = installation_id
self.ai_handler = AiHandler()
self.question_str = question_str
self.vars = {
"title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.pr.body,
"language": self.git_provider.get_main_pr_language(),
"diff": "", # empty diff for initial calculation
"questions": self.question_str,
}
self.token_handler = TokenHandler(self.git_provider.pr,
self.vars,
settings.pr_questions_prompt.system,
settings.pr_questions_prompt.user)
self.patches_diff = None
self.prediction = None
async def answer(self):
logging.info('Answering a PR question...')
self.git_provider.publish_comment("Preparing answer...")
logging.info('Getting PR diff...')
self.patches_diff = get_pr_diff(self.git_provider, self.token_handler)
logging.info('Getting AI prediction...')
self.prediction = await self._get_prediction()
logging.info('Preparing answer...')
pr_comment = self._prepare_pr_answer()
if settings.config.publish_review:
logging.info('Pushing answer...')
self.git_provider.publish_comment(pr_comment)
return ""
async def _get_prediction(self):
variables = copy.deepcopy(self.vars)
variables["diff"] = self.patches_diff # update diff
environment = Environment(undefined=StrictUndefined)
system_prompt = environment.from_string(settings.pr_questions_prompt.system).render(variables)
user_prompt = environment.from_string(settings.pr_questions_prompt.user).render(variables)
if settings.config.verbosity_level >= 2:
logging.info(f"\nSystem prompt:\n{system_prompt}")
logging.info(f"\nUser prompt:\n{user_prompt}")
model = settings.config.model
response, finish_reason = await self.ai_handler.chat_completion(model=model, temperature=0.2,
system=system_prompt, user=user_prompt)
return response
def _prepare_pr_answer(self) -> str:
answer_str = f"Questions: {self.question_str}\n\n"
answer_str += f"Answer: {self.prediction.strip()}\n\n"
return answer_str

View File

@ -0,0 +1,88 @@
import copy
import json
import logging
from typing import Optional
from jinja2 import Environment, StrictUndefined
from pr_agent.algo.ai_handler import AiHandler
from pr_agent.algo.pr_processing import get_pr_diff
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.algo.utils import convert_to_markdown
from pr_agent.config_loader import settings
from pr_agent.git_providers import get_git_provider
class PRReviewer:
def __init__(self, pr_url: str, installation_id: Optional[int] = None):
self.git_provider = get_git_provider()(pr_url, installation_id)
self.main_language = self.git_provider.get_main_pr_language()
self.installation_id = installation_id
self.ai_handler = AiHandler()
self.patches_diff = None
self.prediction = None
self.vars = {
"title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.pr.body,
"language": self.git_provider.get_main_pr_language(),
"diff": "", # empty diff for initial calculation
"require_tests": settings.pr_reviewer.require_tests_review,
"require_security": settings.pr_reviewer.require_security_review,
"require_minimal_and_focused": settings.pr_reviewer.require_minimal_and_focused_review,
'extended_code_suggestions': settings.pr_reviewer.extended_code_suggestions,
'num_code_suggestions': settings.pr_reviewer.num_code_suggestions,
}
self.token_handler = TokenHandler(self.git_provider.pr,
self.vars,
settings.pr_review_prompt.system,
settings.pr_review_prompt.user)
async def review(self):
logging.info('Reviewing PR...')
if settings.config.publish_review:
self.git_provider.publish_comment("Preparing review...")
logging.info('Getting PR diff...')
self.patches_diff = get_pr_diff(self.git_provider, self.token_handler)
logging.info('Getting AI prediction...')
self.prediction = await self._get_prediction()
logging.info('Preparing PR review...')
pr_comment = self._prepare_pr_review()
if settings.config.publish_review:
logging.info('Pushing PR review...')
self.git_provider.publish_comment(pr_comment)
return ""
async def _get_prediction(self):
variables = copy.deepcopy(self.vars)
variables["diff"] = self.patches_diff # update diff
environment = Environment(undefined=StrictUndefined)
system_prompt = environment.from_string(settings.pr_review_prompt.system).render(variables)
user_prompt = environment.from_string(settings.pr_review_prompt.user).render(variables)
if settings.config.verbosity_level >= 2:
logging.info(f"\nSystem prompt:\n{system_prompt}")
logging.info(f"\nUser prompt:\n{user_prompt}")
model = settings.config.model
response, finish_reason = await self.ai_handler.chat_completion(model=model, temperature=0.2,
system=system_prompt, user=user_prompt)
try:
json.loads(response)
except json.decoder.JSONDecodeError:
logging.warning("Could not decode JSON")
response = {}
return response
def _prepare_pr_review(self) -> str:
review = self.prediction.strip()
try:
data = json.loads(review)
except json.decoder.JSONDecodeError:
logging.error("Unable to decode JSON response from AI")
data = {}
markdown_text = convert_to_markdown(data)
markdown_text += "\nAdd a comment that says 'Please review' to ask for a new review after you update the PR.\n"
markdown_text += "Add a comment that says 'Please answer <QUESTION...>' to ask a question about this PR.\n"
if settings.config.verbosity_level >= 2:
logging.info(f"Markdown response:\n{markdown_text}")
return markdown_text

32
pyproject.toml Normal file
View File

@ -0,0 +1,32 @@
[tool.ruff]
line-length = 120
select = [
"E", # Pyflakes
"F", # Pyflakes
"B", # flake8-bugbear
"I001", # isort basic checks
"I002", # isort missing-required-import
]
# First commit - only fixing isort
fixable = [
"I001", # isort basic checks
]
unfixable = [
"B", # Avoid trying to fix flake8-bugbear (`B`) violations.
]
exclude = [
"api/code_completions",
]
ignore = [
"E999", "B008"
]
[tool.ruff.per-file-ignores]
"__init__.py" = ["E402"] # Ignore `E402` (import violations) in all `__init__.py` files, and in `path/to/file.py`.
# TODO: should decide if maybe not to ignore these.

1
requirements-dev.txt Normal file
View File

@ -0,0 +1 @@
pytest==7.4.0

8
requirements.txt Normal file
View File

@ -0,0 +1,8 @@
dynaconf==3.1.12
fastapi==0.99.0
PyGithub==1.58.2
retry==0.9.2
openai==0.27.8
Jinja2==3.1.2
tiktoken==0.4.0
uvicorn==0.22.0

View File

@ -0,0 +1,124 @@
# Generated by CodiumAI
from pr_agent.algo.utils import convert_to_markdown
"""
Code Analysis
Objective:
The objective of the 'convert_to_markdown' function is to convert a dictionary of data into a markdown-formatted text.
The function takes in a dictionary as input and recursively iterates through its keys and values to generate the
markdown text.
Inputs:
- A dictionary of data containing information about a pull request.
Flow:
- Initialize an empty string variable 'markdown_text'.
- Create a dictionary 'emojis' containing emojis for each key in the input dictionary.
- Iterate through the input dictionary:
- If the value is empty, continue to the next iteration.
- If the value is a dictionary, recursively call the 'convert_to_markdown' function with the value as input and
append the returned markdown text to 'markdown_text'.
- If the value is a list:
- If the key is 'code suggestions', add an additional line break to 'markdown_text'.
- Get the corresponding emoji for the key from the 'emojis' dictionary. If no emoji is found, use a dash.
- Append the emoji and key to 'markdown_text'.
- Iterate through the items in the list:
- If the item is a dictionary and the key is 'code suggestions', call the 'parse_code_suggestion' function with
the item as input and append the returned markdown text to 'markdown_text'.
- If the item is not empty, append it to 'markdown_text'.
- If the value is not 'n/a', get the corresponding emoji for the key from the 'emojis' dictionary. If no emoji is
found, use a dash. Append the emoji, key, and value to 'markdown_text'.
- Return 'markdown_text'.
Outputs:
- A markdown-formatted string containing the information from the input dictionary.
Additional aspects:
- The function uses recursion to handle nested dictionaries.
- The 'parse_code_suggestion' function is called for items in the 'code suggestions' list.
- The function uses emojis to add visual cues to the markdown text.
"""
class TestConvertToMarkdown:
# Tests that the function works correctly with a simple dictionary input
def test_simple_dictionary_input(self):
input_data = {
'Main theme': 'Test',
'Description and title': 'Test description',
'Type of PR': 'Test type',
'Relevant tests added': 'no',
'Unrelated changes': 'n/a', # won't be included in the output
'Minimal and focused': 'Yes',
'General PR suggestions': 'general suggestion...',
'Code suggestions': [
{
'Suggestion number': 1,
'Code example': {
'Before': 'Code before',
'After': 'Code after'
}
},
{
'Suggestion number': 2,
'Code example': {
'Before': 'Code before 2',
'After': 'Code after 2'
}
}
]
}
expected_output = """\
- 🎯 **Main theme:** Test
- 🔍 **Description and title:** Test description
- 📌 **Type of PR:** Test type
- 🧪 **Relevant tests added:** no
- ✨ **Minimal and focused:** Yes
- 💡 **General PR suggestions:** general suggestion...
- 🤖 **Code suggestions:**
- **suggestion 1:**
- **Code example:**
- **Before:**
```
Code before
```
- **After:**
```
Code after
```
- **suggestion 2:**
- **Code example:**
- **Before:**
```
Code before 2
```
- **After:**
```
Code after 2
```
"""
assert convert_to_markdown(input_data).strip() == expected_output.strip()
# Tests that the function works correctly with an empty dictionary input
def test_empty_dictionary_input(self):
input_data = {}
expected_output = ""
assert convert_to_markdown(input_data).strip() == expected_output.strip()
def test_dictionary_input_containing_only_empty_dictionaries(self):
input_data = {
'Main theme': {},
'Description and title': {},
'Type of PR': {},
'Relevant tests added': {},
'Unrelated changes': {},
'Minimal and focused': {},
'General PR suggestions': {},
'Code suggestions': {}
}
expected_output = ""
assert convert_to_markdown(input_data).strip() == expected_output.strip()

View File

@ -0,0 +1,84 @@
# Generated by CodiumAI
from pr_agent.algo.git_patch_processing import omit_deletion_hunks
"""
Code Analysis
Objective:
The objective of the "omit_deletion_hunks" function is to remove deletion hunks from a patch file and return only the
added lines.
Inputs:
- "patch_lines": a list of strings representing the lines of a patch file.
Flow:
- Initialize empty lists "temp_hunk" and "added_patched", and boolean variables "add_hunk" and "inside_hunk".
- Compile a regular expression pattern to match hunk headers.
- Iterate through each line in "patch_lines".
- If the line starts with "@@", match the line with the hunk header pattern, finish the previous hunk if necessary,
and append the line to "temp_hunk".
- If the line does not start with "@@", append the line to "temp_hunk", check if it is an added line, and set
"add_hunk" to True if it is.
- If the function reaches the end of "patch_lines" and there is an unfinished hunk with added lines, append it to
"added_patched".
- Join the lines in "added_patched" with newline characters and return the resulting string.
Outputs:
- A string representing the added lines in the patch file.
Additional aspects:
- The function only considers hunks with added lines and ignores hunks with deleted lines.
- The function assumes that the input patch file is well-formed and follows the unified diff format.
"""
class TestOmitDeletionHunks:
# Tests that the function correctly handles a simple patch containing only additions
def test_simple_patch_additions(self):
patch_lines = ['@@ -1,0 +1,1 @@\n', '+added line\n']
expected_output = '@@ -1,0 +1,1 @@\n\n+added line\n'
assert omit_deletion_hunks(patch_lines) == expected_output
# Tests that the function correctly omits deletion hunks and concatenates multiple hunks in a patch.
def test_patch_multiple_hunks(self):
patch_lines = ['@@ -1,0 +1,1 @@\n', '-deleted line', '+added line\n', '@@ -2,0 +3,1 @@\n', '-deleted line\n',
'-another deleted line\n']
expected_output = '@@ -1,0 +1,1 @@\n\n-deleted line\n+added line\n'
assert omit_deletion_hunks(patch_lines) == expected_output
# Tests that the function correctly omits deletion lines from the patch when there are no additions or context
# lines.
def test_patch_only_deletions(self):
patch_lines = ['@@ -1,1 +1,0 @@\n', '-deleted line\n']
expected_output = ''
assert omit_deletion_hunks(patch_lines) == expected_output
# Additional deletion lines
patch_lines = ['@@ -1,1 +1,0 @@\n', '-deleted line\n', '-another deleted line\n']
expected_output = ''
assert omit_deletion_hunks(patch_lines) == expected_output
# Additional context lines
patch_lines = ['@@ -1,1 +1,0 @@\n', '-deleted line\n', '-another deleted line\n', 'context line 1\n',
'context line 2\n', 'context line 3\n']
expected_output = ''
assert omit_deletion_hunks(patch_lines) == expected_output
# Tests that the function correctly handles an empty patch
def test_empty_patch(self):
patch_lines = []
expected_output = ''
assert omit_deletion_hunks(patch_lines) == expected_output
# Tests that the function correctly handles a patch containing only one hunk
def test_patch_one_hunk(self):
patch_lines = ['@@ -1,0 +1,1 @@\n', '+added line\n']
expected_output = '@@ -1,0 +1,1 @@\n\n+added line\n'
assert omit_deletion_hunks(patch_lines) == expected_output
# Tests that the function correctly handles a patch containing only deletions and no additions
def test_patch_deletions_no_additions(self):
patch_lines = ['@@ -1,1 +1,0 @@\n', '-deleted line\n']
expected_output = ''
assert omit_deletion_hunks(patch_lines) == expected_output

View File

@ -0,0 +1,93 @@
# Generated by CodiumAI
from pr_agent.algo.git_patch_processing import extend_patch
"""
Code Analysis
Objective:
The objective of the 'extend_patch' function is to extend a given patch to include a specified number of surrounding
lines. This function takes in an original file string, a patch string, and the number of lines to extend the patch by,
and returns the extended patch string.
Inputs:
- original_file_str: a string representing the original file
- patch_str: a string representing the patch to be extended
- num_lines: an integer representing the number of lines to extend the patch by
Flow:
1. Split the original file string and patch string into separate lines
2. Initialize variables to keep track of the current hunk's start and size for both the original file and the patch
3. Iterate through each line in the patch string
4. If the line starts with '@@', extract the start and size values for both the original file and the patch, and
calculate the extended start and size values
5. Append the extended hunk header to the extended patch lines list
6. Append the specified number of lines before the hunk to the extended patch lines list
7. Append the current line to the extended patch lines list
8. If the line is not a hunk header, append it to the extended patch lines list
9. Return the extended patch string
Outputs:
- extended_patch_str: a string representing the extended patch
Additional aspects:
- The function uses regular expressions to extract the start and size values from the hunk header
- The function handles cases where the start value of a hunk is less than the number of lines to extend by by setting
the extended start value to 1
- The function handles cases where the hunk extends beyond the end of the original file by only including lines up to
the end of the original file in the extended patch
"""
class TestExtendPatch:
# Tests that the function works correctly with valid input
def test_happy_path(self):
original_file_str = 'line1\nline2\nline3\nline4\nline5'
patch_str = '@@ -2,2 +2,2 @@ init()\n-line2\n+new_line2\nline3'
num_lines = 1
expected_output = '@@ -1,4 +1,4 @@ init()\nline1\n-line2\n+new_line2\nline3\nline4'
actual_output = extend_patch(original_file_str, patch_str, num_lines)
assert actual_output == expected_output
# Tests that the function returns an empty string when patch_str is empty
def test_empty_patch(self):
original_file_str = 'line1\nline2\nline3\nline4\nline5'
patch_str = ''
num_lines = 1
expected_output = ''
assert extend_patch(original_file_str, patch_str, num_lines) == expected_output
# Tests that the function returns the original patch when num_lines is 0
def test_zero_num_lines(self):
original_file_str = 'line1\nline2\nline3\nline4\nline5'
patch_str = '@@ -2,2 +2,2 @@ init()\n-line2\n+new_line2\nline3'
num_lines = 0
assert extend_patch(original_file_str, patch_str, num_lines) == patch_str
# Tests that the function returns the original patch when patch_str contains no hunks
def test_no_hunks(self):
original_file_str = 'line1\nline2\nline3\nline4\nline5'
patch_str = 'no hunks here'
num_lines = 1
expected_output = 'no hunks here'
assert extend_patch(original_file_str, patch_str, num_lines) == expected_output
# Tests that the function extends a patch with a single hunk correctly
def test_single_hunk(self):
original_file_str = 'line1\nline2\nline3\nline4\nline5'
patch_str = '@@ -2,3 +2,3 @@ init()\n-line2\n+new_line2\nline3\nline4'
num_lines = 1
expected_output = '@@ -1,5 +1,5 @@ init()\nline1\n-line2\n+new_line2\nline3\nline4\nline5'
actual_output = extend_patch(original_file_str, patch_str, num_lines)
assert actual_output == expected_output
# Tests the functionality of extending a patch with multiple hunks.
def test_multiple_hunks(self):
original_file_str = 'line1\nline2\nline3\nline4\nline5\nline6'
patch_str = '@@ -2,3 +2,3 @@ init()\n-line2\n+new_line2\nline3\nline4\n@@ -4,1 +4,1 @@ init2()\n-line4\n+new_line4' # noqa: E501
num_lines = 1
expected_output = '@@ -1,5 +1,5 @@ init()\nline1\n-line2\n+new_line2\nline3\nline4\nline5\n@@ -3,3 +3,3 @@ init2()\nline3\n-line4\n+new_line4\nline5' # noqa: E501
actual_output = extend_patch(original_file_str, patch_str, num_lines)
assert actual_output == expected_output

View File

@ -0,0 +1,84 @@
# Generated by CodiumAI
import logging
from pr_agent.algo.git_patch_processing import handle_patch_deletions
from pr_agent.config_loader import settings
"""
Code Analysis
Objective:
The objective of the function is to handle entire file or deletion patches and return the patch after omitting the
deletion hunks.
Inputs:
- patch: a string representing the patch to be handled
- original_file_content_str: a string representing the original content of the file
- new_file_content_str: a string representing the new content of the file
- file_name: a string representing the name of the file
Flow:
- If new_file_content_str is empty, set patch to "File was deleted" and return it
- Otherwise, split patch into lines and omit the deletion hunks using the omit_deletion_hunks function
- If the resulting patch is different from the original patch, log a message and set patch to the new patch
- Return the resulting patch
Outputs:
- A string representing the patch after omitting the deletion hunks
Additional aspects:
- The function uses the settings from the configuration files to determine the verbosity level of the logging messages
- The omit_deletion_hunks function is called to remove the deletion hunks from the patch
- The function handles the case where the new_file_content_str is empty by setting the patch to "File was deleted"
"""
class TestHandlePatchDeletions:
# Tests that handle_patch_deletions returns the original patch when new_file_content_str is not empty
def test_handle_patch_deletions_happy_path_new_file_content_exists(self):
patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n+baz\n'
original_file_content_str = 'foo\nbar\n'
new_file_content_str = 'foo\nbaz\n'
file_name = 'file.py'
assert handle_patch_deletions(patch, original_file_content_str, new_file_content_str,
file_name) == patch.rstrip()
# Tests that handle_patch_deletions logs a message when verbosity_level is greater than 0
def test_handle_patch_deletions_happy_path_verbosity_level_greater_than_0(self, caplog):
patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n+baz\n'
original_file_content_str = 'foo\nbar\n'
new_file_content_str = ''
file_name = 'file.py'
settings.config.verbosity_level = 1
with caplog.at_level(logging.INFO):
handle_patch_deletions(patch, original_file_content_str, new_file_content_str, file_name)
assert any("Processing file" in message for message in caplog.messages)
# Tests that handle_patch_deletions returns 'File was deleted' when new_file_content_str is empty
def test_handle_patch_deletions_edge_case_new_file_content_empty(self):
patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n'
original_file_content_str = 'foo\nbar\n'
new_file_content_str = ''
file_name = 'file.py'
assert handle_patch_deletions(patch, original_file_content_str, new_file_content_str,
file_name) == 'File was deleted\n'
# Tests that handle_patch_deletions returns the original patch when patch and patch_new are equal
def test_handle_patch_deletions_edge_case_patch_and_patch_new_are_equal(self):
patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n'
original_file_content_str = 'foo\nbar\n'
new_file_content_str = 'foo\nbar\n'
file_name = 'file.py'
assert handle_patch_deletions(patch, original_file_content_str, new_file_content_str,
file_name).rstrip() == patch.rstrip()
# Tests that handle_patch_deletions returns the modified patch when patch and patch_new are not equal
def test_handle_patch_deletions_edge_case_patch_and_patch_new_are_not_equal(self):
patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar\n'
original_file_content_str = 'foo\nbar\n'
new_file_content_str = 'foo\nbaz\n'
file_name = 'file.py'
expected_patch = '--- a/file.py\n+++ b/file.py\n@@ -1,2 +1,2 @@\n-foo\n-bar'
assert handle_patch_deletions(patch, original_file_content_str, new_file_content_str,
file_name) == expected_patch

View File

@ -0,0 +1,121 @@
# Generated by CodiumAI
from pr_agent.algo.language_handler import sort_files_by_main_languages
import pytest
"""
Code Analysis
Objective:
The objective of the function is to sort a list of files by their main language, putting the files that are in the main language first and the rest of the files after. It takes in a dictionary of languages and their sizes, and a list of files.
Inputs:
- languages: a dictionary containing the languages and their sizes
- files: a list of files
Flow:
1. Sort the languages by their size in descending order
2. Get all extensions for the languages
3. Filter out files with bad extensions
4. Sort files by their extension, putting the files that are in the main extension first and the rest of the files after
5. Map languages_sorted to their respective files
6. Append the files to the files_sorted list
7. Append the rest of the files to the files_sorted list under the "Other" language category
8. Return the files_sorted list
Outputs:
- files_sorted: a list of dictionaries containing the language and its respective files
Additional aspects:
- The function uses a language_extension_map dictionary to map the languages to their respective extensions
- The function uses the filter_bad_extensions function to filter out files with bad extensions
- The function uses a rest_files dictionary to store the files that do not belong to any of the main extensions
"""
class TestSortFilesByMainLanguages:
# Tests that files are sorted by main language, with files in main language first and the rest after
def test_happy_path_sort_files_by_main_languages(self):
languages = {'Python': 10, 'Java': 5, 'C++': 3}
files = [
type('', (object,), {'filename': 'file1.py'})(),
type('', (object,), {'filename': 'file2.java'})(),
type('', (object,), {'filename': 'file3.cpp'})(),
type('', (object,), {'filename': 'file4.py'})(),
type('', (object,), {'filename': 'file5.py'})()
]
expected_output = [
{'language': 'Python', 'files': [files[0], files[3], files[4]]},
{'language': 'Java', 'files': [files[1]]},
{'language': 'C++', 'files': [files[2]]},
{'language': 'Other', 'files': []}
]
assert sort_files_by_main_languages(languages, files) == expected_output
# Tests that function handles empty languages dictionary
def test_edge_case_empty_languages(self):
languages = {}
files = [
type('', (object,), {'filename': 'file1.py'})(),
type('', (object,), {'filename': 'file2.java'})()
]
expected_output = [{'language': 'Other', 'files': []}]
assert sort_files_by_main_languages(languages, files) == expected_output
# Tests that function handles empty files list
def test_edge_case_empty_files(self):
languages = {'Python': 10, 'Java': 5}
files = []
expected_output = [
{'language': 'Other', 'files': []}
]
assert sort_files_by_main_languages(languages, files) == expected_output
# Tests that function handles languages with no extensions
def test_edge_case_languages_with_no_extensions(self):
languages = {'Python': 10, 'Java': 5, 'C++': 3}
files = [
type('', (object,), {'filename': 'file1.py'})(),
type('', (object,), {'filename': 'file2.java'})(),
type('', (object,), {'filename': 'file3.cpp'})()
]
expected_output = [
{'language': 'Python', 'files': [files[0]]},
{'language': 'Java', 'files': [files[1]]},
{'language': 'C++', 'files': [files[2]]},
{'language': 'Other', 'files': []}
]
assert sort_files_by_main_languages(languages, files) == expected_output
# Tests the behavior of the function when all files have bad extensions and only one new valid file is added.
def test_edge_case_files_with_bad_extensions_only(self):
languages = {'Python': 10, 'Java': 5, 'C++': 3}
files = [
type('', (object,), {'filename': 'file1.csv'})(),
type('', (object,), {'filename': 'file2.pdf'})(),
type('', (object,), {'filename': 'file3.py'})() # new valid file
]
expected_output = [{'language': 'Python', 'files': [files[2]]}, {'language': 'Other', 'files': []}]
assert sort_files_by_main_languages(languages, files) == expected_output
# Tests general behaviour of function
def test_general_behaviour_sort_files_by_main_languages(self):
languages = {'Python': 10, 'Java': 5, 'C++': 3}
files = [
type('', (object,), {'filename': 'file1.py'})(),
type('', (object,), {'filename': 'file2.java'})(),
type('', (object,), {'filename': 'file3.cpp'})(),
type('', (object,), {'filename': 'file4.py'})(),
type('', (object,), {'filename': 'file5.py'})(),
type('', (object,), {'filename': 'file6.py'})(),
type('', (object,), {'filename': 'file7.java'})(),
type('', (object,), {'filename': 'file8.cpp'})(),
type('', (object,), {'filename': 'file9.py'})()
]
expected_output = [
{'language': 'Python', 'files': [files[0], files[3], files[4], files[5], files[8]]},
{'language': 'Java', 'files': [files[1], files[6]]},
{'language': 'C++', 'files': [files[2], files[7]]},
{'language': 'Other', 'files': []}
]
assert sort_files_by_main_languages(languages, files) == expected_output

View File

@ -0,0 +1,88 @@
# Generated by CodiumAI
from pr_agent.algo.utils import parse_code_suggestion
"""
Code Analysis
Objective:
The objective of the function is to convert a dictionary into a markdown format. The function takes in a dictionary as
input and recursively converts it into a markdown format. The function is specifically designed to handle dictionaries
that contain code suggestions.
Inputs:
- output_data: a dictionary containing the data to be converted into markdown format
Flow:
- Initialize an empty string variable called markdown_text
- Create a dictionary of emojis to be used in the markdown format
- Iterate through the items in the input dictionary
- If the value is empty, skip to the next item
- If the value is a dictionary, recursively call the function with the value as input
- If the value is a list, iterate through the list and add each item to the markdown format
- If the value is not 'n/a', add it to the markdown format
- If the key is 'code suggestions', call the parse_code_suggestion function to handle the list of code suggestions
- Return the markdown format as a string
Outputs:
- markdown_text: a string containing the input dictionary converted into markdown format
Additional aspects:
- The function uses the textwrap module to indent code examples in the markdown format
- The parse_code_suggestion function is called to handle the 'code suggestions' key in the input dictionary
- The function uses emojis to add visual cues to the markdown format
"""
class TestParseCodeSuggestion:
# Tests that function returns empty string when input is an empty dictionary
def test_empty_dict(self):
input_data = {}
expected_output = "\n" # modified to expect a newline character
assert parse_code_suggestion(input_data) == expected_output
# Tests that function returns correct output when 'suggestion number' key has a non-integer value
def test_non_integer_suggestion_number(self):
input_data = {
"Suggestion number": "one",
"Description": "This is a suggestion"
}
expected_output = "- **suggestion one:**\n - **Description:** This is a suggestion\n\n"
assert parse_code_suggestion(input_data) == expected_output
# Tests that function returns correct output when 'before' or 'after' key has a non-string value
def test_non_string_before_or_after(self):
input_data = {
"Code example": {
"Before": 123,
"After": ["a", "b", "c"]
}
}
expected_output = " - **Code example:**\n - **Before:**\n ```\n 123\n ```\n - **After:**\n ```\n ['a', 'b', 'c']\n ```\n\n" # noqa: E501
assert parse_code_suggestion(input_data) == expected_output
# Tests that function returns correct output when input dictionary does not have 'code example' key
def test_no_code_example_key(self):
code_suggestions = {
'suggestion number': 1,
'suggestion': 'Suggestion 1',
'description': 'Description 1',
'before': 'Before 1',
'after': 'After 1'
}
expected_output = "- **suggestion 1:**\n - **suggestion:** Suggestion 1\n - **description:** Description 1\n - **before:** Before 1\n - **after:** After 1\n\n" # noqa: E501
assert parse_code_suggestion(code_suggestions) == expected_output
# Tests that function returns correct output when input dictionary has 'code example' key
def test_with_code_example_key(self):
code_suggestions = {
'suggestion number': 2,
'suggestion': 'Suggestion 2',
'description': 'Description 2',
'code example': {
'before': 'Before 2',
'after': 'After 2'
}
}
expected_output = "- **suggestion 2:**\n - **suggestion:** Suggestion 2\n - **description:** Description 2\n - **code example:**\n - **before:**\n ```\n Before 2\n ```\n - **after:**\n ```\n After 2\n ```\n\n" # noqa: E501
assert parse_code_suggestion(code_suggestions) == expected_output