diff --git a/.github/workflows/pr-agent-review.yaml b/.github/workflows/pr-agent-review.yaml index 9dcf59b8..6932b4bd 100644 --- a/.github/workflows/pr-agent-review.yaml +++ b/.github/workflows/pr-agent-review.yaml @@ -24,4 +24,7 @@ jobs: OPENAI_KEY: ${{ secrets.OPENAI_KEY }} OPENAI_ORG: ${{ secrets.OPENAI_ORG }} # optional GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PINECONE.API_KEY: ${{ secrets.PINECONE_API_KEY }} + PINECONE.ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }} + diff --git a/INSTALL.md b/INSTALL.md index 74368ac0..5f107b20 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -15,6 +15,7 @@ There are several ways to use PR-Agent: - [Method 5: Run as a GitHub App](INSTALL.md#method-5-run-as-a-github-app) - [Method 6: Deploy as a Lambda Function](INSTALL.md#method-6---deploy-as-a-lambda-function) - [Method 7: AWS CodeCommit](INSTALL.md#method-7---aws-codecommit-setup) +- [Method 8: Run a GitLab webhook server](INSTALL.md#method-8---run-a-gitlab-webhook-server) --- ### Method 1: Use Docker image (no installation required) @@ -23,9 +24,15 @@ To request a review for a PR, or ask a question about a PR, you can run directly 1. To request a review for a PR, run the following command: +For GitHub: ``` docker run --rm -it -e OPENAI.KEY= -e GITHUB.USER_TOKEN= codiumai/pr-agent --pr_url review ``` +For GitLab: +``` +docker run --rm -it -e OPENAI.KEY= -e CONFIG.GIT_PROVIDER=gitlab -e GITLAB.PERSONAL_ACCESS_TOKEN= codiumai/pr-agent --pr_url review +``` +For other git providers, update CONFIG.GIT_PROVIDER accordingly, and check the `pr_agent/settings/.secrets_template.toml` file for the environment variables expected names and values. 2. To ask a question about a PR, run the following command: @@ -343,9 +350,24 @@ PYTHONPATH="/PATH/TO/PROJECTS/pr-agent" python pr_agent/cli.py \ review ``` -### Appendix - **Debugging LLM API Calls** -If you're testing your codium/pr-agent server, and need to see if calls were made successfully + the exact call logs, you can use the [LiteLLM Debugger tool](https://docs.litellm.ai/docs/debugging/hosted_debugging). +--- -You can do this by setting `litellm_debugger=true` in configuration.toml. Your Logs will be viewable in real-time @ `admin.litellm.ai/`. Set your email in the `.secrets.toml` under 'user_email'. +### Method 8 - Run a GitLab webhook server - \ No newline at end of file +1. From the GitLab workspace or group, create an access token. Enable the "api" scope only. +2. Generate a random secret for your app, and save it for later. For example, you can use: + +``` +WEBHOOK_SECRET=$(python -c "import secrets; print(secrets.token_hex(10))") +``` +3. Follow the instructions to build the Docker image, setup a secrets file and deploy on your own server from [Method 5](#method-5-run-as-a-github-app) steps 4-7. +4. In the secrets file, fill in the following: + - Your OpenAI key. + - In the [gitlab] section, fill in personal_access_token and shared_secret. The access token can be a personal access token, or a group or project access token. + - Set deployment_type to 'gitlab' in [configuration.toml](./pr_agent/settings/configuration.toml) +5. Create a webhook in GitLab. Set the URL to the URL of your app's server. Set the secret token to the generated secret from step 2. +In the "Trigger" section, check the ‘comments’ and ‘merge request events’ boxes. +6. Test your installation by opening a merge request or commenting or a merge request using one of CodiumAI's commands. + + +======= diff --git a/README.md b/README.md index f946a59c..b735dbaf 100644 --- a/README.md +++ b/README.md @@ -15,20 +15,22 @@ Making pull requests less painful with an AI agent
-CodiumAI `PR-Agent` is an open-source tool aiming to help developers review pull requests faster and more efficiently. It automatically analyzes the pull request and can provide several types of PR feedback: +CodiumAI `PR-Agent` is an open-source tool aiming to help developers review pull requests faster and more efficiently. It automatically analyzes the pull request and can provide several types of commands: -**Auto Description (/describe)**: Automatically generating [PR description](https://github.com/Codium-ai/pr-agent/pull/229#issue-1860711415) - title, type, summary, code walkthrough and labels. +‣ **Auto Description (`/describe`)**: Automatically generating [PR description](https://github.com/Codium-ai/pr-agent/pull/229#issue-1860711415) - title, type, summary, code walkthrough and labels. \ -**Auto Review (/review)**: [Adjustable feedback](https://github.com/Codium-ai/pr-agent/pull/229#issuecomment-1695022908) about the PR main theme, type, relevant tests, security issues, score, and various suggestions for the PR content. +‣ **Auto Review (`/review`)**: [Adjustable feedback](https://github.com/Codium-ai/pr-agent/pull/229#issuecomment-1695022908) about the PR main theme, type, relevant tests, security issues, score, and various suggestions for the PR content. \ -**Question Answering (/ask ...)**: Answering [free-text questions](https://github.com/Codium-ai/pr-agent/pull/229#issuecomment-1695021332) about the PR. +‣ **Question Answering (`/ask ...`)**: Answering [free-text questions](https://github.com/Codium-ai/pr-agent/pull/229#issuecomment-1695021332) about the PR. \ -**Code Suggestions (/improve)**: [Committable code suggestions](https://github.com/Codium-ai/pr-agent/pull/229#discussion_r1306919276) for improving the PR. +‣ **Code Suggestions (`/improve`)**: [Committable code suggestions](https://github.com/Codium-ai/pr-agent/pull/229#discussion_r1306919276) for improving the PR. \ -**Update Changelog (/update_changelog)**: Automatically updating the CHANGELOG.md file with the [PR changes](https://github.com/Codium-ai/pr-agent/pull/168#discussion_r1282077645). +‣ **Update Changelog (`/update_changelog`)**: Automatically updating the CHANGELOG.md file with the [PR changes](https://github.com/Codium-ai/pr-agent/pull/168#discussion_r1282077645). +\ +‣ **Find similar issue (`/similar_issue`)**: Automatically retrieves and presents [similar issues](https://github.com/Alibaba-MIIL/ASL/issues/107). -See the [usage guide](./Usage.md) for instructions how to run the different tools from [CLI](./Usage.md#working-from-a-local-repo-cli), or by [online usage](./Usage.md#online-usage). +See the [usage guide](./Usage.md) for instructions how to run the different tools from [CLI](./Usage.md#working-from-a-local-repo-cli), or by [online usage](./Usage.md#online-usage), as well as additional details on optional commands and configurations.

Example results:

@@ -96,26 +98,28 @@ See the [usage guide](./Usage.md) for instructions how to run the different tool ## Overview `PR-Agent` offers extensive pull request functionalities across various git providers: -| | | GitHub | Gitlab | Bitbucket | CodeCommit | Azure DevOps | -|-------|---------------------------------------------|:------:|:------:|:---------:|:----------:|:----------:| -| TOOLS | Review | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | -| | Ask | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: -| | Auto-Description | :white_check_mark: | :white_check_mark: | | :white_check_mark: | :white_check_mark: | -| | Improve Code | :white_check_mark: | :white_check_mark: | | | | -| | ⮑ Extended | :white_check_mark: | :white_check_mark: | | | | -| | Reflect and Review | :white_check_mark: | | | | :white_check_mark: | -| | Update CHANGELOG.md | :white_check_mark: | | | | | +| | | GitHub | Gitlab | Bitbucket | CodeCommit | Azure DevOps | Gerrit | +|-------|---------------------------------------------|:------:|:------:|:---------:|:----------:|:----------:|:----------:| +| TOOLS | Review | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| | Ask | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| | Auto-Description | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| | Improve Code | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: | +| | ⮑ Extended | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: | +| | Reflect and Review | :white_check_mark: | | :white_check_mark: | | :white_check_mark: | :white_check_mark: | +| | Update CHANGELOG.md | :white_check_mark: | | :white_check_mark: | | | | +| | Find similar issue | :white_check_mark: | | | | | | | | | | | | | | | USAGE | CLI | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | | App / webhook | :white_check_mark: | :white_check_mark: | | | | | | Tagging bot | :white_check_mark: | | | | | | | Actions | :white_check_mark: | | | | | +| | Web server | | | | | | :white_check_mark: | | | | | | | | | -| CORE | PR compression | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | -| | Repo language prioritization | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | -| | Adaptive and token-aware
file patch fitting | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | -| | Multiple models support | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | -| | Incremental PR Review | :white_check_mark: | | | | | +| CORE | PR compression | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| | Repo language prioritization | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| | Adaptive and token-aware
file patch fitting | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| | Multiple models support | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| | Incremental PR Review | :white_check_mark: | | | | | | Review the **[usage guide](./Usage.md)** section for detailed instructions how to use the different tools, select the relevant git provider (GitHub, Gitlab, Bitbucket,...), and adjust the configuration file to your needs. @@ -153,6 +157,7 @@ There are several ways to use PR-Agent: - Allowing you to automate the review process on your private or public repositories - [Method 6: Deploy as a Lambda Function](INSTALL.md#method-6---deploy-as-a-lambda-function) - [Method 7: AWS CodeCommit](INSTALL.md#method-7---aws-codecommit-setup) +- [Method 8: Run a GitLab webhook server](INSTALL.md#method-8---run-a-gitlab-webhook-server) ## How it works @@ -180,7 +185,7 @@ Here are some advantages of PR-Agent: - [x] Support additional models, as a replacement for OpenAI (see [here](https://github.com/Codium-ai/pr-agent/pull/172)) - [x] Develop additional logic for handling large PRs (see [here](https://github.com/Codium-ai/pr-agent/pull/229)) - [ ] Add additional context to the prompt. For example, repo (or relevant files) summarization, with tools such a [ctags](https://github.com/universal-ctags/ctags) -- [ ] PR-Agent for issues, and just for pull requests +- [x] PR-Agent for issues - [ ] Adding more tools. Possible directions: - [x] PR description - [x] Inline code suggestions @@ -197,4 +202,4 @@ Here are some advantages of PR-Agent: - [Aider - GPT powered coding in your terminal](https://github.com/paul-gauthier/aider) - [openai-pr-reviewer](https://github.com/coderabbitai/openai-pr-reviewer) - [CodeReview BOT](https://github.com/anc95/ChatGPT-CodeReview) -- [AI-Maintainer](https://github.com/merwanehamadi/AI-Maintainer) \ No newline at end of file +- [AI-Maintainer](https://github.com/merwanehamadi/AI-Maintainer) diff --git a/Usage.md b/Usage.md index 336de974..bc2544b8 100644 --- a/Usage.md +++ b/Usage.md @@ -50,12 +50,12 @@ When running from your local repo (CLI), your local configuration file will be u Examples for invoking the different tools via the CLI: -- **Review**: `python cli.py --pr_url= /review` -- **Describe**: `python cli.py --pr_url= /describe` -- **Improve**: `python cli.py --pr_url= /improve` -- **Ask**: `python cli.py --pr_url= /ask "Write me a poem about this PR"` -- **Reflect**: `python cli.py --pr_url= /reflect` -- **Update Changelog**: `python cli.py --pr_url= /update_changelog` +- **Review**: `python cli.py --pr_url= review` +- **Describe**: `python cli.py --pr_url= describe` +- **Improve**: `python cli.py --pr_url= improve` +- **Ask**: `python cli.py --pr_url= ask "Write me a poem about this PR"` +- **Reflect**: `python cli.py --pr_url= reflect` +- **Update Changelog**: `python cli.py --pr_url= update_changelog` `` is the url of the relevant PR (for example: https://github.com/Codium-ai/pr-agent/pull/50). @@ -149,15 +149,83 @@ TBD #### Changing a model See [here](pr_agent/algo/__init__.py) for the list of available models. -To use Llama2 model, for example, set: +#### Azure +To use Azure, set: +``` +api_key = "" # your azure api key +api_type = "azure" +api_version = '2023-05-15' # Check Azure documentation for the current API version +api_base = "" # The base URL for your Azure OpenAI resource. e.g. "https://.openai.azure.com" +deployment_id = "" # The deployment name you chose when you deployed the engine +``` +in your .secrets.toml + +and ``` [config] +model="" # the OpenAI model you've deployed on Azure (e.g. gpt-3.5-turbo) +``` +in the configuration.toml + +#### Huggingface + +**Local** +You can run Huggingface models locally through either [VLLM](https://docs.litellm.ai/docs/providers/vllm) or [Ollama](https://docs.litellm.ai/docs/providers/ollama) + +E.g. to use a new Huggingface model locally via Ollama, set: +``` +[__init__.py] +MAX_TOKENS = { + "model-name-on-ollama": +} +e.g. +MAX_TOKENS={ + ..., + "llama2": 4096 +} + + +[config] # in configuration.toml +model = "ollama/llama2" + +[ollama] # in .secrets.toml +api_base = ... # the base url for your huggingface inference endpoint +``` + +**Inference Endpoints** + +To use a new model with Huggingface Inference Endpoints, for example, set: +``` +[__init__.py] +MAX_TOKENS = { + "model-name-on-huggingface": +} +e.g. +MAX_TOKENS={ + ..., + "meta-llama/Llama-2-7b-chat-hf": 4096 +} +[config] # in configuration.toml +model = "huggingface/meta-llama/Llama-2-7b-chat-hf" + +[huggingface] # in .secrets.toml +key = ... # your huggingface api key +api_base = ... # the base url for your huggingface inference endpoint +``` +(you can obtain a Llama2 key from [here](https://replicate.com/replicate/llama-2-70b-chat/api)) + +#### Replicate + +To use Llama2 model with Replicate, for example, set: +``` +[config] # in configuration.toml model = "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1" -[replicate] +[replicate] # in .secrets.toml key = ... ``` (you can obtain a Llama2 key from [here](https://replicate.com/replicate/llama-2-70b-chat/api)) + Also review the [AiHandler](pr_agent/algo/ai_handler.py) file for instruction how to set keys for other models. #### Extra instructions @@ -179,4 +247,26 @@ And use the following settings (you have to replace the values) in .secrets.toml [azure_devops] org = "https://dev.azure.com/YOUR_ORGANIZATION/" pat = "YOUR_PAT_TOKEN" -``` \ No newline at end of file +``` + +#### Similar issue tool + +[Example usage](https://github.com/Alibaba-MIIL/ASL/issues/107) + + + +To enable usage of the '**similar issue**' tool, you need to set the following keys in `.secrets.toml` (or in the relevant environment variables): +``` +[pinecone] +api_key = "..." +environment = "..." +``` +These parameters can be obtained by registering to [Pinecone](https://app.pinecone.io/?sessionType=signup/). + +- To invoke the 'similar issue' tool from **CLI**, run: +`python3 cli.py --issue_url=... similar_issue` + +- To invoke the 'similar' issue tool via online usage, [comment](https://github.com/Codium-ai/pr-agent/issues/178#issuecomment-1716934893) on a PR: +`/similar_issue` + +- You can also enable the 'similar issue' tool to run automatically when a new issue is opened, by adding it to the [pr_commands list in the github_app section](https://github.com/Codium-ai/pr-agent/blob/main/pr_agent/settings/configuration.toml#L66) diff --git a/docker/Dockerfile b/docker/Dockerfile index 4336cacc..951f846c 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -18,6 +18,10 @@ FROM base as github_polling ADD pr_agent pr_agent CMD ["python", "pr_agent/servers/github_polling.py"] +FROM base as gitlab_webhook +ADD pr_agent pr_agent +CMD ["python", "pr_agent/servers/gitlab_webhook.py"] + FROM base as test ADD requirements-dev.txt . RUN pip install -r requirements-dev.txt && rm requirements-dev.txt diff --git a/pics/debugger.png b/pics/debugger.png deleted file mode 100644 index 7d8f201f..00000000 Binary files a/pics/debugger.png and /dev/null differ diff --git a/pics/similar_issue_tool.png b/pics/similar_issue_tool.png new file mode 100644 index 00000000..4ec51c81 Binary files /dev/null and b/pics/similar_issue_tool.png differ diff --git a/pr_agent/agent/pr_agent.py b/pr_agent/agent/pr_agent.py index 70121f3c..07c34c51 100644 --- a/pr_agent/agent/pr_agent.py +++ b/pr_agent/agent/pr_agent.py @@ -9,6 +9,7 @@ from pr_agent.git_providers import get_git_provider from pr_agent.tools.pr_code_suggestions import PRCodeSuggestions from pr_agent.tools.pr_description import PRDescription from pr_agent.tools.pr_information_from_user import PRInformationFromUser +from pr_agent.tools.pr_similar_issue import PRSimilarIssue from pr_agent.tools.pr_questions import PRQuestions from pr_agent.tools.pr_reviewer import PRReviewer from pr_agent.tools.pr_update_changelog import PRUpdateChangelog @@ -30,6 +31,7 @@ command2class = { "update_changelog": PRUpdateChangelog, "config": PRConfig, "settings": PRConfig, + "similar_issue": PRSimilarIssue, } commands = list(command2class.keys()) diff --git a/pr_agent/algo/__init__.py b/pr_agent/algo/__init__.py index 798fc6c5..56511cd0 100644 --- a/pr_agent/algo/__init__.py +++ b/pr_agent/algo/__init__.py @@ -1,4 +1,5 @@ MAX_TOKENS = { + 'text-embedding-ada-002': 8000, 'gpt-3.5-turbo': 4000, 'gpt-3.5-turbo-0613': 4000, 'gpt-3.5-turbo-0301': 4000, @@ -11,4 +12,5 @@ MAX_TOKENS = { 'claude-2': 100000, 'command-nightly': 4096, 'replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1': 4096, + 'meta-llama/Llama-2-7b-chat-hf': 4096 } diff --git a/pr_agent/algo/ai_handler.py b/pr_agent/algo/ai_handler.py index fcc5f04c..819ba25b 100644 --- a/pr_agent/algo/ai_handler.py +++ b/pr_agent/algo/ai_handler.py @@ -1,13 +1,12 @@ import logging +import os import litellm import openai from litellm import acompletion from openai.error import APIError, RateLimitError, Timeout, TryAgain from retry import retry - from pr_agent.config_loader import get_settings - OPENAI_RETRIES = 5 @@ -26,7 +25,11 @@ class AiHandler: try: openai.api_key = get_settings().openai.key litellm.openai_key = get_settings().openai.key - litellm.debugger = get_settings().config.litellm_debugger + if get_settings().get("litellm.use_client"): + litellm_token = get_settings().get("litellm.LITELLM_TOKEN") + assert litellm_token, "LITELLM_TOKEN is required" + os.environ["LITELLM_TOKEN"] = litellm_token + litellm.use_client = True self.azure = False if get_settings().get("OPENAI.ORG", None): litellm.organization = get_settings().openai.org @@ -48,6 +51,8 @@ class AiHandler: litellm.replicate_key = get_settings().replicate.key if get_settings().get("HUGGINGFACE.KEY", None): litellm.huggingface_key = get_settings().huggingface.key + if get_settings().get("HUGGINGFACE.API_BASE", None): + litellm.api_base = get_settings().huggingface.api_base except AttributeError as e: raise ValueError("OpenAI key is required") from e diff --git a/pr_agent/algo/token_handler.py b/pr_agent/algo/token_handler.py index f018a92b..d7eff9d7 100644 --- a/pr_agent/algo/token_handler.py +++ b/pr_agent/algo/token_handler.py @@ -21,7 +21,7 @@ class TokenHandler: method. """ - def __init__(self, pr, vars: dict, system, user): + def __init__(self, pr=None, vars: dict = {}, system="", user=""): """ Initializes the TokenHandler object. @@ -32,7 +32,8 @@ class TokenHandler: - user: The user string. """ self.encoder = get_token_encoder() - self.prompt_tokens = self._get_system_user_tokens(pr, self.encoder, vars, system, user) + if pr is not None: + self.prompt_tokens = self._get_system_user_tokens(pr, self.encoder, vars, system, user) def _get_system_user_tokens(self, pr, encoder, vars: dict, system, user): """ diff --git a/pr_agent/algo/utils.py b/pr_agent/algo/utils.py index 1259a46e..1dfc6dc1 100644 --- a/pr_agent/algo/utils.py +++ b/pr_agent/algo/utils.py @@ -20,7 +20,7 @@ def get_setting(key: str) -> Any: except Exception: return global_settings.get(key, None) -def convert_to_markdown(output_data: dict) -> str: +def convert_to_markdown(output_data: dict, gfm_supported: bool=True) -> str: """ Convert a dictionary of data into markdown format. Args: @@ -49,11 +49,14 @@ def convert_to_markdown(output_data: dict) -> str: continue if isinstance(value, dict): markdown_text += f"## {key}\n\n" - markdown_text += convert_to_markdown(value) + markdown_text += convert_to_markdown(value, gfm_supported) elif isinstance(value, list): emoji = emojis.get(key, "") if key.lower() == 'code feedback': - markdown_text += f"\n\n- **
{ emoji } Code feedback:**\n\n" + if gfm_supported: + markdown_text += f"\n\n- **
{ emoji } Code feedback:**\n\n" + else: + markdown_text += f"\n\n- **{emoji} Code feedback:**\n\n" else: markdown_text += f"- {emoji} **{key}:**\n\n" for item in value: @@ -62,7 +65,10 @@ def convert_to_markdown(output_data: dict) -> str: elif item: markdown_text += f" - {item}\n" if key.lower() == 'code feedback': - markdown_text += "
\n\n" + if gfm_supported: + markdown_text += "
\n\n" + else: + markdown_text += "\n\n" elif value != 'n/a': emoji = emojis.get(key, "") markdown_text += f"- {emoji} **{key}:** {value}\n" @@ -168,7 +174,7 @@ def fix_json_escape_char(json_message=None): Raises: None - """ + """ try: result = json.loads(json_message) except Exception as e: @@ -195,7 +201,7 @@ def convert_str_to_datetime(date_str): Example: >>> convert_str_to_datetime('Mon, 01 Jan 2022 12:00:00 UTC') datetime.datetime(2022, 1, 1, 12, 0, 0) - """ + """ datetime_format = '%a, %d %b %Y %H:%M:%S %Z' return datetime.strptime(date_str, datetime_format) diff --git a/pr_agent/cli.py b/pr_agent/cli.py index 01c1a7ec..07c37f5e 100644 --- a/pr_agent/cli.py +++ b/pr_agent/cli.py @@ -17,6 +17,7 @@ For example: - cli.py --pr_url=... improve - cli.py --pr_url=... ask "write me a poem about this PR" - cli.py --pr_url=... reflect +- cli.py --issue_url=... similar_issue Supported commands: -review / review_pr - Add a review that includes a summary of the PR and specific suggestions for improvement. @@ -37,14 +38,22 @@ Configuration: To edit any configuration parameter from 'configuration.toml', just add -config_path=. For example: 'python cli.py --pr_url=... review --pr_reviewer.extra_instructions="focus on the file: ..."' """) - parser.add_argument('--pr_url', type=str, help='The URL of the PR to review', required=True) + parser.add_argument('--pr_url', type=str, help='The URL of the PR to review', default=None) + parser.add_argument('--issue_url', type=str, help='The URL of the Issue to review', default=None) parser.add_argument('command', type=str, help='The', choices=commands, default='review') parser.add_argument('rest', nargs=argparse.REMAINDER, default=[]) args = parser.parse_args(inargs) + if not args.pr_url and not args.issue_url: + parser.print_help() + return + logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO")) command = args.command.lower() get_settings().set("CONFIG.CLI_MODE", True) - result = asyncio.run(PRAgent().handle_request(args.pr_url, command + " " + " ".join(args.rest))) + if args.issue_url: + result = asyncio.run(PRAgent().handle_request(args.issue_url, command + " " + " ".join(args.rest))) + else: + result = asyncio.run(PRAgent().handle_request(args.pr_url, command + " " + " ".join(args.rest))) if not result: parser.print_help() diff --git a/pr_agent/git_providers/__init__.py b/pr_agent/git_providers/__init__.py index 376d09f5..968f0dfc 100644 --- a/pr_agent/git_providers/__init__.py +++ b/pr_agent/git_providers/__init__.py @@ -5,6 +5,8 @@ from pr_agent.git_providers.github_provider import GithubProvider from pr_agent.git_providers.gitlab_provider import GitLabProvider from pr_agent.git_providers.local_git_provider import LocalGitProvider from pr_agent.git_providers.azuredevops_provider import AzureDevopsProvider +from pr_agent.git_providers.gerrit_provider import GerritProvider + _GIT_PROVIDERS = { 'github': GithubProvider, @@ -12,7 +14,8 @@ _GIT_PROVIDERS = { 'bitbucket': BitbucketProvider, 'azure': AzureDevopsProvider, 'codecommit': CodeCommitProvider, - 'local' : LocalGitProvider + 'local' : LocalGitProvider, + 'gerrit': GerritProvider, } def get_git_provider(): diff --git a/pr_agent/git_providers/azuredevops_provider.py b/pr_agent/git_providers/azuredevops_provider.py index 71ae0947..8a7693ce 100644 --- a/pr_agent/git_providers/azuredevops_provider.py +++ b/pr_agent/git_providers/azuredevops_provider.py @@ -38,7 +38,8 @@ class AzureDevopsProvider: self.set_pr(pr_url) def is_supported(self, capability: str) -> bool: - if capability in ['get_issue_comments', 'create_inline_comment', 'publish_inline_comments', 'get_labels', 'remove_initial_comment']: + if capability in ['get_issue_comments', 'create_inline_comment', 'publish_inline_comments', 'get_labels', + 'remove_initial_comment', 'gfm_markdown']: return False return True diff --git a/pr_agent/git_providers/bitbucket_provider.py b/pr_agent/git_providers/bitbucket_provider.py index 0cd860fa..56b9f711 100644 --- a/pr_agent/git_providers/bitbucket_provider.py +++ b/pr_agent/git_providers/bitbucket_provider.py @@ -7,6 +7,7 @@ import requests from atlassian.bitbucket import Cloud from starlette_context import context +from ..algo.pr_processing import clip_tokens, find_line_number_of_relevant_line_in_file from ..config_loader import get_settings from .git_provider import FilePatchInfo, GitProvider @@ -35,9 +36,8 @@ class BitbucketProvider(GitProvider): self.incremental = incremental if pr_url: self.set_pr(pr_url) - self.bitbucket_comment_api_url = self.pr._BitbucketBase__data["links"][ - "comments" - ]["href"] + self.bitbucket_comment_api_url = self.pr._BitbucketBase__data["links"]["comments"]["href"] + self.bitbucket_pull_request_api_url = self.pr._BitbucketBase__data["links"]['self']['href'] def get_repo_settings(self): try: @@ -101,12 +101,7 @@ class BitbucketProvider(GitProvider): return False def is_supported(self, capability: str) -> bool: - if capability in [ - "get_issue_comments", - "create_inline_comment", - "publish_inline_comments", - "get_labels", - ]: + if capability in ['get_issue_comments', 'publish_inline_comments', 'get_labels', 'gfm_markdown']: return False return True @@ -151,17 +146,30 @@ class BitbucketProvider(GitProvider): except Exception as e: logging.exception(f"Failed to remove temp comments, error: {e}") - def publish_inline_comment( - self, comment: str, from_line: int, to_line: int, file: str - ): - payload = json.dumps( - { - "content": { - "raw": comment, - }, - "inline": {"to": from_line, "path": file}, - } - ) + + # funtion to create_inline_comment + def create_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str): + position, absolute_position = find_line_number_of_relevant_line_in_file(self.get_diff_files(), relevant_file.strip('`'), relevant_line_in_file) + if position == -1: + if get_settings().config.verbosity_level >= 2: + logging.info(f"Could not find position for {relevant_file} {relevant_line_in_file}") + subject_type = "FILE" + else: + subject_type = "LINE" + path = relevant_file.strip() + return dict(body=body, path=path, position=absolute_position) if subject_type == "LINE" else {} + + + def publish_inline_comment(self, comment: str, from_line: int, file: str): + payload = json.dumps( { + "content": { + "raw": comment, + }, + "inline": { + "to": from_line, + "path": file + }, + }) response = requests.request( "POST", self.bitbucket_comment_api_url, data=payload, headers=self.headers ) @@ -169,9 +177,7 @@ class BitbucketProvider(GitProvider): def publish_inline_comments(self, comments: list[dict]): for comment in comments: - self.publish_inline_comment( - comment["body"], comment["start_line"], comment["line"], comment["path"] - ) + self.publish_inline_comment(comment['body'], comment['start_line'], comment['path']) def get_title(self): return self.pr.title @@ -238,16 +244,22 @@ class BitbucketProvider(GitProvider): def get_commit_messages(self): return "" # not implemented yet + + # bitbucket does not support labels + def publish_description(self, pr_title: str, description: str): + payload = json.dumps({ + "description": description, + "title": pr_title - def publish_description(self, pr_title: str, pr_body: str): - pass - def create_inline_comment( - self, body: str, relevant_file: str, relevant_line_in_file: str - ): - pass + }) - def publish_labels(self, labels): - pass + response = requests.request("PUT", self.bitbucket_pull_request_api_url, headers=self.headers, data=payload) + return response + # bitbucket does not support labels + def publish_labels(self, pr_types: list): + pass + + # bitbucket does not support labels def get_labels(self): pass diff --git a/pr_agent/git_providers/codecommit_client.py b/pr_agent/git_providers/codecommit_client.py index 6200340d..5f18c90d 100644 --- a/pr_agent/git_providers/codecommit_client.py +++ b/pr_agent/git_providers/codecommit_client.py @@ -54,11 +54,16 @@ class CodeCommitClient: def __init__(self): self.boto_client = None + def is_supported(self, capability: str) -> bool: + if capability in ["gfm_markdown"]: + return False + return True + def _connect_boto_client(self): try: self.boto_client = boto3.client("codecommit") except Exception as e: - raise ValueError(f"Failed to connect to AWS CodeCommit: {e}") + raise ValueError(f"Failed to connect to AWS CodeCommit: {e}") from e def get_differences(self, repo_name: int, destination_commit: str, source_commit: str): """ @@ -90,7 +95,11 @@ class CodeCommitClient: ): differences.extend(page.get("differences", [])) except botocore.exceptions.ClientError as e: - raise ValueError(f"Failed to retrieve differences from CodeCommit PR #{self.pr_num}") from e + if e.response["Error"]["Code"] == 'RepositoryDoesNotExistException': + raise ValueError(f"CodeCommit cannot retrieve differences: Repository does not exist: {repo_name}") from e + raise ValueError(f"CodeCommit cannot retrieve differences for {source_commit}..{destination_commit}") from e + except Exception as e: + raise ValueError(f"CodeCommit cannot retrieve differences for {source_commit}..{destination_commit}") from e output = [] for json in differences: @@ -122,6 +131,8 @@ class CodeCommitClient: try: response = self.boto_client.get_file(repositoryName=repo_name, commitSpecifier=sha_hash, filePath=file_path) except botocore.exceptions.ClientError as e: + if e.response["Error"]["Code"] == 'RepositoryDoesNotExistException': + raise ValueError(f"CodeCommit cannot retrieve PR: Repository does not exist: {repo_name}") from e # if the file does not exist, but is flagged as optional, then return an empty string if optional and e.response["Error"]["Code"] == 'FileDoesNotExistException': return "" @@ -133,11 +144,12 @@ class CodeCommitClient: return response.get("fileContent", "") - def get_pr(self, pr_number: int): + def get_pr(self, repo_name: str, pr_number: int): """ Get a information about a CodeCommit PR. Args: + - repo_name: Name of the repository - pr_number: The PR number you are requesting Returns: @@ -155,6 +167,8 @@ class CodeCommitClient: except botocore.exceptions.ClientError as e: if e.response["Error"]["Code"] == 'PullRequestDoesNotExistException': raise ValueError(f"CodeCommit cannot retrieve PR: PR number does not exist: {pr_number}") from e + if e.response["Error"]["Code"] == 'RepositoryDoesNotExistException': + raise ValueError(f"CodeCommit cannot retrieve PR: Repository does not exist: {repo_name}") from e raise ValueError(f"CodeCommit cannot retrieve PR: {pr_number}: boto client error") from e except Exception as e: raise ValueError(f"CodeCommit cannot retrieve PR: {pr_number}") from e @@ -201,7 +215,7 @@ class CodeCommitClient: except Exception as e: raise ValueError(f"Error calling publish_description") from e - def publish_comment(self, repo_name: str, pr_number: int, destination_commit: str, source_commit: str, comment: str): + def publish_comment(self, repo_name: str, pr_number: int, destination_commit: str, source_commit: str, comment: str, annotation_file: str = None, annotation_line: int = None): """ Publish a comment to a pull request @@ -210,7 +224,13 @@ class CodeCommitClient: - pr_number: number of the pull request - destination_commit: The commit hash you want to merge into (the "before" hash) (usually on the main or master branch) - source_commit: The commit hash of the code you are adding (the "after" branch) - - pr_comment: comment + - comment: The comment you want to publish + - annotation_file: The file you want to annotate (optional) + - annotation_line: The line number you want to annotate (optional) + + Comment annotations for CodeCommit are different than GitHub. + CodeCommit only designates the starting line number for the comment. + It does not support the ending line number to highlight a range of lines. Returns: - None @@ -223,13 +243,30 @@ class CodeCommitClient: self._connect_boto_client() try: - self.boto_client.post_comment_for_pull_request( - pullRequestId=str(pr_number), - repositoryName=repo_name, - beforeCommitId=destination_commit, - afterCommitId=source_commit, - content=comment, - ) + # If the comment has code annotations, + # then set the file path and line number in the location dictionary + if annotation_file and annotation_line: + self.boto_client.post_comment_for_pull_request( + pullRequestId=str(pr_number), + repositoryName=repo_name, + beforeCommitId=destination_commit, + afterCommitId=source_commit, + content=comment, + location={ + "filePath": annotation_file, + "filePosition": annotation_line, + "relativeFileVersion": "AFTER", + }, + ) + else: + # The comment does not have code annotations + self.boto_client.post_comment_for_pull_request( + pullRequestId=str(pr_number), + repositoryName=repo_name, + beforeCommitId=destination_commit, + afterCommitId=source_commit, + content=comment, + ) except botocore.exceptions.ClientError as e: if e.response["Error"]["Code"] == 'RepositoryDoesNotExistException': raise ValueError(f"Repository does not exist: {repo_name}") from e diff --git a/pr_agent/git_providers/codecommit_provider.py b/pr_agent/git_providers/codecommit_provider.py index d43409c3..5361f665 100644 --- a/pr_agent/git_providers/codecommit_provider.py +++ b/pr_agent/git_providers/codecommit_provider.py @@ -74,6 +74,7 @@ class CodeCommitProvider(GitProvider): "create_inline_comment", "publish_inline_comments", "get_labels", + "gfm_markdown" ]: return False return True @@ -180,10 +181,37 @@ class CodeCommitProvider(GitProvider): comment=pr_comment, ) except Exception as e: - raise ValueError(f"CodeCommit Cannot post comment for PR: {self.pr_num}") from e + raise ValueError(f"CodeCommit Cannot publish comment for PR: {self.pr_num}") from e def publish_code_suggestions(self, code_suggestions: list) -> bool: - return [""] # not implemented yet + counter = 1 + for suggestion in code_suggestions: + # Verify that each suggestion has the required keys + if not all(key in suggestion for key in ["body", "relevant_file", "relevant_lines_start"]): + logging.warning(f"Skipping code suggestion #{counter}: Each suggestion must have 'body', 'relevant_file', 'relevant_lines_start' keys") + continue + + # Publish the code suggestion to CodeCommit + try: + logging.debug(f"Code Suggestion #{counter} in file: {suggestion['relevant_file']}: {suggestion['relevant_lines_start']}") + self.codecommit_client.publish_comment( + repo_name=self.repo_name, + pr_number=self.pr_num, + destination_commit=self.pr.destination_commit, + source_commit=self.pr.source_commit, + comment=suggestion["body"], + annotation_file=suggestion["relevant_file"], + annotation_line=suggestion["relevant_lines_start"], + ) + except Exception as e: + raise ValueError(f"CodeCommit Cannot publish code suggestions for PR: {self.pr_num}") from e + + counter += 1 + + # The calling function passes in a list of code suggestions, and this function publishes each suggestion one at a time. + # If we were to return False here, the calling function will attempt to publish the same list of code suggestions again, one at a time. + # Since this function publishes the suggestions one at a time anyway, we always return True here to avoid the retry. + return True def publish_labels(self, labels): return [""] # not implemented yet @@ -195,6 +223,7 @@ class CodeCommitProvider(GitProvider): return "" # not implemented yet def publish_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str): + # https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/codecommit/client/post_comment_for_compared_commit.html raise NotImplementedError("CodeCommit provider does not support publishing inline comments yet") def create_inline_comment(self, body: str, relevant_file: str, relevant_line_in_file: str): @@ -255,9 +284,11 @@ class CodeCommitProvider(GitProvider): return self.codecommit_client.get_file(self.repo_name, settings_filename, self.pr.source_commit, optional=True) def add_eyes_reaction(self, issue_comment_id: int) -> Optional[int]: + logging.info("CodeCommit provider does not support eyes reaction yet") return True def remove_reaction(self, issue_comment_id: int, reaction_id: int) -> bool: + logging.info("CodeCommit provider does not support removing reactions yet") return True @staticmethod @@ -315,16 +346,16 @@ class CodeCommitProvider(GitProvider): return re.match(r"^[a-z]{2}-(gov-)?[a-z]+-\d\.console\.aws\.amazon\.com$", hostname) is not None def _get_pr(self): - response = self.codecommit_client.get_pr(self.pr_num) + response = self.codecommit_client.get_pr(self.repo_name, self.pr_num) if len(response.targets) == 0: raise ValueError(f"No files found in CodeCommit PR: {self.pr_num}") - # TODO: implement support for multiple commits in one CodeCommit PR - # for now, we are only using the first commit in the PR + # TODO: implement support for multiple targets in one CodeCommit PR + # for now, we are only using the first target in the PR if len(response.targets) > 1: logging.warning( - "Multiple commits in one PR is not supported for CodeCommit yet. Continuing, using the first commit only..." + "Multiple targets in one PR is not supported for CodeCommit yet. Continuing, using the first target only..." ) # Return our object that mimics PullRequest class from the PyGithub library diff --git a/pr_agent/git_providers/gerrit_provider.py b/pr_agent/git_providers/gerrit_provider.py new file mode 100644 index 00000000..dd56803a --- /dev/null +++ b/pr_agent/git_providers/gerrit_provider.py @@ -0,0 +1,403 @@ +import json +import logging +import os +import pathlib +import shutil +import subprocess +import uuid +from collections import Counter, namedtuple +from pathlib import Path +from tempfile import mkdtemp, NamedTemporaryFile + +import requests +import urllib3.util +from git import Repo + +from pr_agent.config_loader import get_settings +from pr_agent.git_providers.git_provider import GitProvider, FilePatchInfo, \ + EDIT_TYPE +from pr_agent.git_providers.local_git_provider import PullRequestMimic + +logger = logging.getLogger(__name__) + + +def _call(*command, **kwargs) -> (int, str, str): + res = subprocess.run( + command, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + check=True, + **kwargs, + ) + return res.stdout.decode() + + +def clone(url, directory): + logger.info("Cloning %s to %s", url, directory) + stdout = _call('git', 'clone', "--depth", "1", url, directory) + logger.info(stdout) + + +def fetch(url, refspec, cwd): + logger.info("Fetching %s %s", url, refspec) + stdout = _call( + 'git', 'fetch', '--depth', '2', url, refspec, + cwd=cwd + ) + logger.info(stdout) + + +def checkout(cwd): + logger.info("Checking out") + stdout = _call('git', 'checkout', "FETCH_HEAD", cwd=cwd) + logger.info(stdout) + + +def show(*args, cwd=None): + logger.info("Show") + return _call('git', 'show', *args, cwd=cwd) + + +def diff(*args, cwd=None): + logger.info("Diff") + patch = _call('git', 'diff', *args, cwd=cwd) + if not patch: + logger.warning("No changes found") + return + return patch + + +def reset_local_changes(cwd): + logger.info("Reset local changes") + _call('git', 'checkout', "--force", cwd=cwd) + + +def add_comment(url: urllib3.util.Url, refspec, message): + *_, patchset, changenum = refspec.rsplit("/") + message = "'" + message.replace("'", "'\"'\"'") + "'" + return _call( + "ssh", + "-p", str(url.port), + f"{url.auth}@{url.host}", + "gerrit", "review", + "--message", message, + # "--code-review", score, + f"{patchset},{changenum}", + ) + + +def list_comments(url: urllib3.util.Url, refspec): + *_, patchset, _ = refspec.rsplit("/") + stdout = _call( + "ssh", + "-p", str(url.port), + f"{url.auth}@{url.host}", + "gerrit", "query", + "--comments", + "--current-patch-set", patchset, + "--format", "JSON", + ) + change_set, *_ = stdout.splitlines() + return json.loads(change_set)["currentPatchSet"]["comments"] + + +def prepare_repo(url: urllib3.util.Url, project, refspec): + repo_url = (f"{url.scheme}://{url.auth}@{url.host}:{url.port}/{project}") + + directory = pathlib.Path(mkdtemp()) + clone(repo_url, directory), + fetch(repo_url, refspec, cwd=directory) + checkout(cwd=directory) + return directory + + +def adopt_to_gerrit_message(message): + lines = message.splitlines() + buf = [] + for line in lines: + # remove markdown formatting + line = (line.replace("*", "") + .replace("``", "`") + .replace("
", "") + .replace("
", "") + .replace("", "") + .replace("", "")) + + line = line.strip() + if line.startswith('#'): + buf.append("\n" + + line.replace('#', '').removesuffix(":").strip() + + ":") + continue + elif line.startswith('-'): + buf.append(line.removeprefix('-').strip()) + continue + else: + buf.append(line) + return "\n".join(buf).strip() + + +def add_suggestion(src_filename, context: str, start, end: int): + with ( + NamedTemporaryFile("w", delete=False) as tmp, + open(src_filename, "r") as src + ): + lines = src.readlines() + tmp.writelines(lines[:start - 1]) + if context: + tmp.write(context) + tmp.writelines(lines[end:]) + + shutil.copy(tmp.name, src_filename) + os.remove(tmp.name) + + +def upload_patch(patch, path): + patch_server_endpoint = get_settings().get( + 'gerrit.patch_server_endpoint') + patch_server_token = get_settings().get( + 'gerrit.patch_server_token') + + response = requests.post( + patch_server_endpoint, + json={ + "content": patch, + "path": path, + }, + headers={ + "Content-Type": "application/json", + "Authorization": f"Bearer {patch_server_token}", + } + ) + response.raise_for_status() + patch_server_endpoint = patch_server_endpoint.rstrip("/") + return patch_server_endpoint + "/" + path + + +class GerritProvider(GitProvider): + + def __init__(self, key: str, incremental=False): + self.project, self.refspec = key.split(':') + assert self.project, "Project name is required" + assert self.refspec, "Refspec is required" + base_url = get_settings().get('gerrit.url') + assert base_url, "Gerrit URL is required" + user = get_settings().get('gerrit.user') + assert user, "Gerrit user is required" + + parsed = urllib3.util.parse_url(base_url) + self.parsed_url = urllib3.util.parse_url( + f"{parsed.scheme}://{user}@{parsed.host}:{parsed.port}" + ) + + self.repo_path = prepare_repo( + self.parsed_url, self.project, self.refspec + ) + self.repo = Repo(self.repo_path) + assert self.repo + + self.pr = PullRequestMimic(self.get_pr_title(), self.get_diff_files()) + + def get_pr_title(self): + """ + Substitutes the branch-name as the PR-mimic title. + """ + return self.repo.branches[0].name + + def get_issue_comments(self): + comments = list_comments(self.parsed_url, self.refspec) + Comments = namedtuple('Comments', ['reversed']) + Comment = namedtuple('Comment', ['body']) + return Comments([Comment(c['message']) for c in reversed(comments)]) + + def get_labels(self): + raise NotImplementedError( + 'Getting labels is not implemented for the gerrit provider') + + def add_eyes_reaction(self, issue_comment_id: int): + raise NotImplementedError( + 'Adding reactions is not implemented for the gerrit provider') + + def remove_reaction(self, issue_comment_id: int, reaction_id: int): + raise NotImplementedError( + 'Removing reactions is not implemented for the gerrit provider') + + def get_commit_messages(self): + return [self.repo.head.commit.message] + + def get_repo_settings(self): + try: + with open(self.repo_path / ".pr_agent.toml", 'rb') as f: + contents = f.read() + return contents + except OSError: + return b"" + + def get_diff_files(self) -> list[FilePatchInfo]: + diffs = self.repo.head.commit.diff( + self.repo.head.commit.parents[0], # previous commit + create_patch=True, + R=True + ) + + diff_files = [] + for diff_item in diffs: + if diff_item.a_blob is not None: + original_file_content_str = ( + diff_item.a_blob.data_stream.read().decode('utf-8') + ) + else: + original_file_content_str = "" # empty file + if diff_item.b_blob is not None: + new_file_content_str = diff_item.b_blob.data_stream.read(). \ + decode('utf-8') + else: + new_file_content_str = "" # empty file + edit_type = EDIT_TYPE.MODIFIED + if diff_item.new_file: + edit_type = EDIT_TYPE.ADDED + elif diff_item.deleted_file: + edit_type = EDIT_TYPE.DELETED + elif diff_item.renamed_file: + edit_type = EDIT_TYPE.RENAMED + diff_files.append( + FilePatchInfo( + original_file_content_str, + new_file_content_str, + diff_item.diff.decode('utf-8'), + diff_item.b_path, + edit_type=edit_type, + old_filename=None + if diff_item.a_path == diff_item.b_path + else diff_item.a_path + ) + ) + self.diff_files = diff_files + return diff_files + + def get_files(self): + diff_index = self.repo.head.commit.diff( + self.repo.head.commit.parents[0], # previous commit + R=True + ) + # Get the list of changed files + diff_files = [item.a_path for item in diff_index] + return diff_files + + def get_languages(self): + """ + Calculate percentage of languages in repository. Used for hunk + prioritisation. + """ + # Get all files in repository + filepaths = [Path(item.path) for item in + self.repo.tree().traverse() if item.type == 'blob'] + # Identify language by file extension and count + lang_count = Counter( + ext.lstrip('.') for filepath in filepaths for ext in + [filepath.suffix.lower()]) + # Convert counts to percentages + total_files = len(filepaths) + lang_percentage = {lang: count / total_files * 100 for lang, count + in lang_count.items()} + return lang_percentage + + def get_pr_description_full(self): + return self.repo.head.commit.message + + def get_user_id(self): + return self.repo.head.commit.author.email + + def is_supported(self, capability: str) -> bool: + if capability in [ + # 'get_issue_comments', + 'create_inline_comment', + 'publish_inline_comments', + 'get_labels', + 'gfm_markdown' + ]: + return False + return True + + def split_suggestion(self, msg) -> tuple[str, str]: + is_code_context = False + description = [] + context = [] + for line in msg.splitlines(): + if line.startswith('```suggestion'): + is_code_context = True + continue + if line.startswith('```'): + is_code_context = False + continue + if is_code_context: + context.append(line) + else: + description.append( + line.replace('*', '') + ) + + return ( + '\n'.join(description), + '\n'.join(context) + '\n' if context else '' + ) + + def publish_code_suggestions(self, code_suggestions: list): + msg = [] + for suggestion in code_suggestions: + description, code = self.split_suggestion(suggestion['body']) + add_suggestion( + pathlib.Path(self.repo_path) / suggestion["relevant_file"], + code, + suggestion["relevant_lines_start"], + suggestion["relevant_lines_end"], + ) + patch = diff(cwd=self.repo_path) + patch_id = uuid.uuid4().hex[0:4] + path = "/".join(["codium-ai", self.refspec, patch_id]) + full_path = upload_patch(patch, path) + reset_local_changes(self.repo_path) + msg.append(f'* {description}\n{full_path}') + + if msg: + add_comment(self.parsed_url, self.refspec, "\n".join(msg)) + return True + + def publish_comment(self, pr_comment: str, is_temporary: bool = False): + if not is_temporary: + msg = adopt_to_gerrit_message(pr_comment) + add_comment(self.parsed_url, self.refspec, msg) + + def publish_description(self, pr_title: str, pr_body: str): + msg = adopt_to_gerrit_message(pr_body) + add_comment(self.parsed_url, self.refspec, pr_title + '\n' + msg) + + def publish_inline_comments(self, comments: list[dict]): + raise NotImplementedError( + 'Publishing inline comments is not implemented for the gerrit ' + 'provider') + + def publish_inline_comment(self, body: str, relevant_file: str, + relevant_line_in_file: str): + raise NotImplementedError( + 'Publishing inline comments is not implemented for the gerrit ' + 'provider') + + def create_inline_comment(self, body: str, relevant_file: str, + relevant_line_in_file: str): + raise NotImplementedError( + 'Creating inline comments is not implemented for the gerrit ' + 'provider') + + def publish_labels(self, labels): + # Not applicable to the local git provider, + # but required by the interface + pass + + def remove_initial_comment(self): + # remove repo, cloned in previous steps + # shutil.rmtree(self.repo_path) + pass + + def get_pr_branch(self): + return self.repo.head diff --git a/pr_agent/git_providers/github_provider.py b/pr_agent/git_providers/github_provider.py index 057bc15a..0521716b 100644 --- a/pr_agent/git_providers/github_provider.py +++ b/pr_agent/git_providers/github_provider.py @@ -32,7 +32,7 @@ class GithubProvider(GitProvider): self.diff_files = None self.git_files = None self.incremental = incremental - if pr_url: + if pr_url and 'pull' in pr_url: self.set_pr(pr_url) self.last_commit_id = list(self.pr.get_commits())[-1] @@ -309,6 +309,35 @@ class GithubProvider(GitProvider): return repo_name, pr_number + @staticmethod + def _parse_issue_url(issue_url: str) -> Tuple[str, int]: + parsed_url = urlparse(issue_url) + + if 'github.com' not in parsed_url.netloc: + raise ValueError("The provided URL is not a valid GitHub URL") + + path_parts = parsed_url.path.strip('/').split('/') + if 'api.github.com' in parsed_url.netloc: + if len(path_parts) < 5 or path_parts[3] != 'issues': + raise ValueError("The provided URL does not appear to be a GitHub ISSUE URL") + repo_name = '/'.join(path_parts[1:3]) + try: + issue_number = int(path_parts[4]) + except ValueError as e: + raise ValueError("Unable to convert issue number to integer") from e + return repo_name, issue_number + + if len(path_parts) < 4 or path_parts[2] != 'issues': + raise ValueError("The provided URL does not appear to be a GitHub PR issue") + + repo_name = '/'.join(path_parts[:2]) + try: + issue_number = int(path_parts[3]) + except ValueError as e: + raise ValueError("Unable to convert issue number to integer") from e + + return repo_name, issue_number + def _get_github_client(self): deployment_type = get_settings().get("GITHUB.DEPLOYMENT_TYPE", "user") diff --git a/pr_agent/git_providers/gitlab_provider.py b/pr_agent/git_providers/gitlab_provider.py index 2deae177..a1d0b334 100644 --- a/pr_agent/git_providers/gitlab_provider.py +++ b/pr_agent/git_providers/gitlab_provider.py @@ -43,7 +43,7 @@ class GitLabProvider(GitProvider): self.incremental = incremental def is_supported(self, capability: str) -> bool: - if capability in ['get_issue_comments', 'create_inline_comment', 'publish_inline_comments']: + if capability in ['get_issue_comments', 'create_inline_comment', 'publish_inline_comments', 'gfm_markdown']: return False return True diff --git a/pr_agent/git_providers/local_git_provider.py b/pr_agent/git_providers/local_git_provider.py index e6ee1456..ac750371 100644 --- a/pr_agent/git_providers/local_git_provider.py +++ b/pr_agent/git_providers/local_git_provider.py @@ -56,7 +56,8 @@ class LocalGitProvider(GitProvider): raise KeyError(f'Branch: {self.target_branch_name} does not exist') def is_supported(self, capability: str) -> bool: - if capability in ['get_issue_comments', 'create_inline_comment', 'publish_inline_comments', 'get_labels']: + if capability in ['get_issue_comments', 'create_inline_comment', 'publish_inline_comments', 'get_labels', + 'gfm_markdown']: return False return True diff --git a/pr_agent/servers/gerrit_server.py b/pr_agent/servers/gerrit_server.py new file mode 100644 index 00000000..04232ea9 --- /dev/null +++ b/pr_agent/servers/gerrit_server.py @@ -0,0 +1,78 @@ +import copy +import logging +import sys +from enum import Enum +from json import JSONDecodeError + +import uvicorn +from fastapi import APIRouter, FastAPI, HTTPException +from pydantic import BaseModel +from starlette.middleware import Middleware +from starlette_context import context +from starlette_context.middleware import RawContextMiddleware + +from pr_agent.agent.pr_agent import PRAgent +from pr_agent.config_loader import global_settings, get_settings + +logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) +router = APIRouter() + + +class Action(str, Enum): + review = "review" + describe = "describe" + ask = "ask" + improve = "improve" + reflect = "reflect" + answer = "answer" + + +class Item(BaseModel): + refspec: str + project: str + msg: str + + +@router.post("/api/v1/gerrit/{action}") +async def handle_gerrit_request(action: Action, item: Item): + logging.debug("Received a Gerrit request") + context["settings"] = copy.deepcopy(global_settings) + + if action == Action.ask: + if not item.msg: + return HTTPException( + status_code=400, + detail="msg is required for ask command" + ) + await PRAgent().handle_request( + f"{item.project}:{item.refspec}", + f"/{item.msg.strip()}" + ) + + +async def get_body(request): + try: + body = await request.json() + except JSONDecodeError as e: + logging.error("Error parsing request body", e) + return {} + return body + + +@router.get("/") +async def root(): + return {"status": "ok"} + + +def start(): + # to prevent adding help messages with the output + get_settings().set("CONFIG.CLI_MODE", True) + middleware = [Middleware(RawContextMiddleware)] + app = FastAPI(middleware=middleware) + app.include_router(router) + + uvicorn.run(app, host="0.0.0.0", port=3000) + + +if __name__ == '__main__': + start() diff --git a/pr_agent/servers/github_action_runner.py b/pr_agent/servers/github_action_runner.py index fbf4f89c..7dbea972 100644 --- a/pr_agent/servers/github_action_runner.py +++ b/pr_agent/servers/github_action_runner.py @@ -12,8 +12,8 @@ async def run_action(): # Get environment variables GITHUB_EVENT_NAME = os.environ.get('GITHUB_EVENT_NAME') GITHUB_EVENT_PATH = os.environ.get('GITHUB_EVENT_PATH') - OPENAI_KEY = os.environ.get('OPENAI_KEY') - OPENAI_ORG = os.environ.get('OPENAI_ORG') + OPENAI_KEY = os.environ.get('OPENAI_KEY') or os.environ.get('OPENAI.KEY') + OPENAI_ORG = os.environ.get('OPENAI_ORG') or os.environ.get('OPENAI.ORG') GITHUB_TOKEN = os.environ.get('GITHUB_TOKEN') get_settings().set("CONFIG.PUBLISH_OUTPUT_PROGRESS", False) @@ -61,12 +61,21 @@ async def run_action(): if action in ["created", "edited"]: comment_body = event_payload.get("comment", {}).get("body") if comment_body: - pr_url = event_payload.get("issue", {}).get("pull_request", {}).get("url") - if pr_url: + is_pr = False + # check if issue is pull request + if event_payload.get("issue", {}).get("pull_request"): + url = event_payload.get("issue", {}).get("pull_request", {}).get("url") + is_pr = True + else: + url = event_payload.get("issue", {}).get("url") + if url: body = comment_body.strip().lower() comment_id = event_payload.get("comment", {}).get("id") - provider = get_git_provider()(pr_url=pr_url) - await PRAgent().handle_request(pr_url, body, notify=lambda: provider.add_eyes_reaction(comment_id)) + provider = get_git_provider()(pr_url=url) + if is_pr: + await PRAgent().handle_request(url, body, notify=lambda: provider.add_eyes_reaction(comment_id)) + else: + await PRAgent().handle_request(url, body) if __name__ == '__main__': diff --git a/pr_agent/servers/github_app.py b/pr_agent/servers/github_app.py index 10584e54..c9f25124 100644 --- a/pr_agent/servers/github_app.py +++ b/pr_agent/servers/github_app.py @@ -98,6 +98,7 @@ async def handle_request(body: Dict[str, Any], event: str): api_url = body["comment"]["pull_request_url"] else: return {} + logging.info(body) logging.info(f"Handling comment because of event={event} and action={action}") comment_id = body.get("comment", {}).get("id") provider = get_git_provider()(pr_url=api_url) @@ -129,6 +130,7 @@ async def handle_request(body: Dict[str, Any], event: str): args = split_command[1:] other_args = update_settings_from_args(args) new_command = ' '.join([command] + other_args) + logging.info(body) logging.info(f"Performing command: {new_command}") await agent.handle_request(api_url, new_command) diff --git a/pr_agent/servers/gitlab_webhook.py b/pr_agent/servers/gitlab_webhook.py index c9b623f7..8321cd60 100644 --- a/pr_agent/servers/gitlab_webhook.py +++ b/pr_agent/servers/gitlab_webhook.py @@ -1,21 +1,51 @@ +import copy +import json import logging +import sys import uvicorn from fastapi import APIRouter, FastAPI, Request, status from fastapi.encoders import jsonable_encoder from fastapi.responses import JSONResponse from starlette.background import BackgroundTasks +from starlette.middleware import Middleware +from starlette_context import context +from starlette_context.middleware import RawContextMiddleware from pr_agent.agent.pr_agent import PRAgent -from pr_agent.config_loader import get_settings +from pr_agent.config_loader import get_settings, global_settings +from pr_agent.secret_providers import get_secret_provider -app = FastAPI() +logging.basicConfig(stream=sys.stdout, level=logging.INFO) router = APIRouter() +secret_provider = get_secret_provider() if get_settings().get("CONFIG.SECRET_PROVIDER") else None + @router.post("/webhook") async def gitlab_webhook(background_tasks: BackgroundTasks, request: Request): + if request.headers.get("X-Gitlab-Token") and secret_provider: + request_token = request.headers.get("X-Gitlab-Token") + secret = secret_provider.get_secret(request_token) + try: + secret_dict = json.loads(secret) + gitlab_token = secret_dict["gitlab_token"] + context["settings"] = copy.deepcopy(global_settings) + context["settings"].gitlab.personal_access_token = gitlab_token + except Exception as e: + logging.error(f"Failed to validate secret {request_token}: {e}") + return JSONResponse(status_code=status.HTTP_401_UNAUTHORIZED, content=jsonable_encoder({"message": "unauthorized"})) + elif get_settings().get("GITLAB.SHARED_SECRET"): + secret = get_settings().get("GITLAB.SHARED_SECRET") + if not request.headers.get("X-Gitlab-Token") == secret: + return JSONResponse(status_code=status.HTTP_401_UNAUTHORIZED, content=jsonable_encoder({"message": "unauthorized"})) + else: + return JSONResponse(status_code=status.HTTP_401_UNAUTHORIZED, content=jsonable_encoder({"message": "unauthorized"})) + gitlab_token = get_settings().get("GITLAB.PERSONAL_ACCESS_TOKEN", None) + if not gitlab_token: + return JSONResponse(status_code=status.HTTP_401_UNAUTHORIZED, content=jsonable_encoder({"message": "unauthorized"})) data = await request.json() + logging.info(json.dumps(data)) if data.get('object_kind') == 'merge_request' and data['object_attributes'].get('action') in ['open', 'reopen']: logging.info(f"A merge request has been opened: {data['object_attributes'].get('title')}") url = data['object_attributes'].get('url') @@ -28,16 +58,18 @@ async def gitlab_webhook(background_tasks: BackgroundTasks, request: Request): background_tasks.add_task(PRAgent().handle_request, url, body) return JSONResponse(status_code=status.HTTP_200_OK, content=jsonable_encoder({"message": "success"})) + +@router.get("/") +async def root(): + return {"status": "ok"} + def start(): gitlab_url = get_settings().get("GITLAB.URL", None) if not gitlab_url: raise ValueError("GITLAB.URL is not set") - gitlab_token = get_settings().get("GITLAB.PERSONAL_ACCESS_TOKEN", None) - if not gitlab_token: - raise ValueError("GITLAB.PERSONAL_ACCESS_TOKEN is not set") get_settings().config.git_provider = "gitlab" - - app = FastAPI() + middleware = [Middleware(RawContextMiddleware)] + app = FastAPI(middleware=middleware) app.include_router(router) uvicorn.run(app, host="0.0.0.0", port=3000) diff --git a/pr_agent/settings/.secrets_template.toml b/pr_agent/settings/.secrets_template.toml index 0ac75519..0271a2c3 100644 --- a/pr_agent/settings/.secrets_template.toml +++ b/pr_agent/settings/.secrets_template.toml @@ -16,6 +16,10 @@ key = "" # Acquire through https://platform.openai.com #deployment_id = "" # The deployment name you chose when you deployed the engine #fallback_deployments = [] # For each fallback model specified in configuration.toml in the [config] section, specify the appropriate deployment_id +[pinecone] +api_key = "..." +environment = "gcp-starter" + [anthropic] key = "" # Optional, uncomment if you want to use Anthropic. Acquire through https://www.anthropic.com/ @@ -24,6 +28,14 @@ key = "" # Optional, uncomment if you want to use Cohere. Acquire through https: [replicate] key = "" # Optional, uncomment if you want to use Replicate. Acquire through https://replicate.com/ + +[huggingface] +key = "" # Optional, uncomment if you want to use Huggingface Inference API. Acquire through https://huggingface.co/docs/api-inference/quicktour +api_base = "" # the base url for your huggingface inference endpoint + +[ollama] +api_base = "" # the base url for your huggingface inference endpoint + [github] # ---- Set the following only for deployment type == "user" user_token = "" # A GitHub personal access token with 'repo' scope. @@ -43,5 +55,12 @@ webhook_secret = "" # Optional, may be commented out. personal_access_token = "" [bitbucket] -# Bitbucket personal bearer token +# For Bitbucket personal/repository bearer token bearer_token = "" + +# For Bitbucket app +app_key = "" +base_url = "" + +[litellm] +LITELLM_TOKEN = "" # see https://docs.litellm.ai/docs/debugging/hosted_debugging for details and instructions on how to get a token diff --git a/pr_agent/settings/configuration.toml b/pr_agent/settings/configuration.toml index e9c59f31..de4edc6c 100644 --- a/pr_agent/settings/configuration.toml +++ b/pr_agent/settings/configuration.toml @@ -10,8 +10,8 @@ use_repo_settings_file=true ai_timeout=180 max_description_tokens = 500 max_commits_tokens = 500 -litellm_debugger=false secret_provider="google_cloud_storage" +cli_mode=false [pr_reviewer] # /review # require_focused_review=false @@ -87,4 +87,27 @@ polling_interval_seconds = 30 [local] # LocalGitProvider settings - uncomment to use paths other than default # description_path= "path/to/description.md" -# review_path= "path/to/review.md" \ No newline at end of file +# review_path= "path/to/review.md" + +[gerrit] +# endpoint to the gerrit service +# url = "ssh://gerrit.example.com:29418" +# user for gerrit authentication +# user = "ai-reviewer" +# patch server where patches will be saved +# patch_server_endpoint = "http://127.0.0.1:5000/patch" +# token to authenticate in the patch server +# patch_server_token = "" + +[litellm] +#use_client = false + +[pr_similar_issue] +skip_comments = false +force_update_dataset = false +max_issues_to_scan = 500 + +[pinecone] +# fill and place in .secrets.toml +#api_key = ... +# environment = "gcp-starter" \ No newline at end of file diff --git a/pr_agent/tools/pr_description.py b/pr_agent/tools/pr_description.py index 6daf60e8..60376731 100644 --- a/pr_agent/tools/pr_description.py +++ b/pr_agent/tools/pr_description.py @@ -208,6 +208,7 @@ class PRDescription: - title: a string containing the PR title. - pr_body: a string containing the PR description body in a markdown format. - markdown_text: a string containing the AI prediction data in a markdown format. used for publishing a comment + - user_description: a string containing the user description """ # Iterate over the dictionary items and append the key and value to 'markdown_text' in a markdown format @@ -244,7 +245,12 @@ class PRDescription: if idx < len(self.data) - 1: pr_body += "\n___\n" + markdown_text = f"## Title\n\n{title}\n\n___\n{pr_body}" + description = data['PR Description'] + if get_settings().config.verbosity_level >= 2: logging.info(f"title:\n{title}\n{pr_body}") return title, pr_body + + return title, pr_body, pr_types, markdown_text, description \ No newline at end of file diff --git a/pr_agent/tools/pr_reviewer.py b/pr_agent/tools/pr_reviewer.py index a89c27a3..7f790d3b 100644 --- a/pr_agent/tools/pr_reviewer.py +++ b/pr_agent/tools/pr_reviewer.py @@ -214,7 +214,7 @@ class PRReviewer: "⏮️ Review for commits since previous PR-Agent review": f"Starting from commit {last_commit_url}"}}) data.move_to_end('Incremental PR Review', last=False) - markdown_text = convert_to_markdown(data) + markdown_text = convert_to_markdown(data, self.git_provider.is_supported("gfm_markdown")) user = self.git_provider.get_user_id() # Add help text if not in CLI mode @@ -266,7 +266,7 @@ class PRReviewer: self.git_provider.publish_inline_comment(content, relevant_file, relevant_line_in_file) if comments: - self.git_provider.publish_inline_comments(comments) + self.git_provider.publish_inline_comments(comments) def _get_user_answers(self) -> Tuple[str, str]: """ diff --git a/pr_agent/tools/pr_similar_issue.py b/pr_agent/tools/pr_similar_issue.py new file mode 100644 index 00000000..d7b6a799 --- /dev/null +++ b/pr_agent/tools/pr_similar_issue.py @@ -0,0 +1,276 @@ +import copy +import json +import logging +from enum import Enum +from typing import List, Tuple +import pinecone +import openai +import pandas as pd +from pydantic import BaseModel, Field + +from pr_agent.algo import MAX_TOKENS +from pr_agent.algo.token_handler import TokenHandler +from pr_agent.config_loader import get_settings +from pr_agent.git_providers import get_git_provider +from pinecone_datasets import Dataset, DatasetMetadata + +MODEL = "text-embedding-ada-002" + + +class PRSimilarIssue: + def __init__(self, issue_url: str, args: list = None): + if get_settings().config.git_provider != "github": + raise Exception("Only github is supported for similar issue tool") + + self.cli_mode = get_settings().CONFIG.CLI_MODE + self.max_issues_to_scan = get_settings().pr_similar_issue.max_issues_to_scan + self.issue_url = issue_url + self.git_provider = get_git_provider()() + repo_name, issue_number = self.git_provider._parse_issue_url(issue_url.split('=')[-1]) + self.git_provider.repo = repo_name + self.git_provider.repo_obj = self.git_provider.github_client.get_repo(repo_name) + self.token_handler = TokenHandler() + repo_obj = self.git_provider.repo_obj + repo_name_for_index = self.repo_name_for_index = repo_obj.full_name.lower().replace('/', '-').replace('_/', '-') + index_name = self.index_name = "codium-ai-pr-agent-issues" + + # assuming pinecone api key and environment are set in secrets file + try: + api_key = get_settings().pinecone.api_key + environment = get_settings().pinecone.environment + except Exception: + if not self.cli_mode: + repo_name, original_issue_number = self.git_provider._parse_issue_url(self.issue_url.split('=')[-1]) + issue_main = self.git_provider.repo_obj.get_issue(original_issue_number) + issue_main.create_comment("Please set pinecone api key and environment in secrets file") + raise Exception("Please set pinecone api key and environment in secrets file") + + # check if index exists, and if repo is already indexed + run_from_scratch = False + upsert = True + pinecone.init(api_key=api_key, environment=environment) + if not index_name in pinecone.list_indexes(): + run_from_scratch = True + upsert = False + else: + if get_settings().pr_similar_issue.force_update_dataset: + upsert = True + else: + pinecone_index = pinecone.Index(index_name=index_name) + res = pinecone_index.fetch([f"example_issue_{repo_name_for_index}"]).to_dict() + if res["vectors"]: + upsert = False + + if run_from_scratch or upsert: # index the entire repo + logging.info('Indexing the entire repo...') + + logging.info('Getting issues...') + issues = list(repo_obj.get_issues(state='all')) + logging.info('Done') + self._update_index_with_issues(issues, repo_name_for_index, upsert=upsert) + else: # update index if needed + pinecone_index = pinecone.Index(index_name=index_name) + issues_to_update = [] + issues_paginated_list = repo_obj.get_issues(state='all') + counter = 1 + for issue in issues_paginated_list: + if issue.pull_request: + continue + issue_str, comments, number = self._process_issue(issue) + issue_key = f"issue_{number}" + id = issue_key + "." + "issue" + res = pinecone_index.fetch([id]).to_dict() + is_new_issue = True + for vector in res["vectors"].values(): + if vector['metadata']['repo'] == repo_name_for_index: + is_new_issue = False + break + if is_new_issue: + counter += 1 + issues_to_update.append(issue) + else: + break + + if issues_to_update: + logging.info(f'Updating index with {counter} new issues...') + self._update_index_with_issues(issues_to_update, repo_name_for_index, upsert=True) + else: + logging.info('No new issues to update') + + async def run(self): + logging.info('Getting issue...') + repo_name, original_issue_number = self.git_provider._parse_issue_url(self.issue_url.split('=')[-1]) + issue_main = self.git_provider.repo_obj.get_issue(original_issue_number) + issue_str, comments, number = self._process_issue(issue_main) + openai.api_key = get_settings().openai.key + logging.info('Done') + + logging.info('Querying...') + res = openai.Embedding.create(input=[issue_str], engine=MODEL) + embeds = [record['embedding'] for record in res['data']] + pinecone_index = pinecone.Index(index_name=self.index_name) + res = pinecone_index.query(embeds[0], + top_k=5, + filter={"repo": self.repo_name_for_index}, + include_metadata=True).to_dict() + relevant_issues_number_list = [] + relevant_comment_number_list = [] + score_list = [] + for r in res['matches']: + issue_number = int(r["id"].split('.')[0].split('_')[-1]) + if original_issue_number == issue_number: + continue + if issue_number not in relevant_issues_number_list: + relevant_issues_number_list.append(issue_number) + if 'comment' in r["id"]: + relevant_comment_number_list.append(int(r["id"].split('.')[1].split('_')[-1])) + else: + relevant_comment_number_list.append(-1) + score_list.append(str("{:.2f}".format(r['score']))) + logging.info('Done') + + logging.info('Publishing response...') + similar_issues_str = "### Similar Issues\n___\n\n" + for i, issue_number_similar in enumerate(relevant_issues_number_list): + issue = self.git_provider.repo_obj.get_issue(issue_number_similar) + title = issue.title + url = issue.html_url + if relevant_comment_number_list[i] != -1: + url = list(issue.get_comments())[relevant_comment_number_list[i]].html_url + similar_issues_str += f"{i + 1}. **[{title}]({url})** (score={score_list[i]})\n\n" + if get_settings().config.publish_output: + response = issue_main.create_comment(similar_issues_str) + logging.info(similar_issues_str) + logging.info('Done') + + def _process_issue(self, issue): + header = issue.title + body = issue.body + number = issue.number + if get_settings().pr_similar_issue.skip_comments: + comments = [] + else: + comments = list(issue.get_comments()) + issue_str = f"Issue Header: \"{header}\"\n\nIssue Body:\n{body}" + return issue_str, comments, number + + def _update_index_with_issues(self, issues_list, repo_name_for_index, upsert=False): + logging.info('Processing issues...') + corpus = Corpus() + example_issue_record = Record( + id=f"example_issue_{repo_name_for_index}", + text="example_issue", + metadata=Metadata(repo=repo_name_for_index) + ) + corpus.append(example_issue_record) + + counter = 0 + for issue in issues_list: + if issue.pull_request: + continue + + counter += 1 + if counter % 100 == 0: + logging.info(f"Scanned {counter} issues") + if counter >= self.max_issues_to_scan: + logging.info(f"Scanned {self.max_issues_to_scan} issues, stopping") + break + + issue_str, comments, number = self._process_issue(issue) + issue_key = f"issue_{number}" + username = issue.user.login + created_at = str(issue.created_at) + if len(issue_str) < 8000 or \ + self.token_handler.count_tokens(issue_str) < MAX_TOKENS[MODEL]: # fast reject first + issue_record = Record( + id=issue_key + "." + "issue", + text=issue_str, + metadata=Metadata(repo=repo_name_for_index, + username=username, + created_at=created_at, + level=IssueLevel.ISSUE) + ) + corpus.append(issue_record) + if comments: + for j, comment in enumerate(comments): + comment_body = comment.body + num_words_comment = len(comment_body.split()) + if num_words_comment < 10 or not isinstance(comment_body, str): + continue + + if len(comment_body) < 8000 or \ + self.token_handler.count_tokens(comment_body) < MAX_TOKENS[MODEL]: + comment_record = Record( + id=issue_key + ".comment_" + str(j + 1), + text=comment_body, + metadata=Metadata(repo=repo_name_for_index, + username=username, # use issue username for all comments + created_at=created_at, + level=IssueLevel.COMMENT) + ) + corpus.append(comment_record) + df = pd.DataFrame(corpus.dict()["documents"]) + logging.info('Done') + + logging.info('Embedding...') + openai.api_key = get_settings().openai.key + list_to_encode = list(df["text"].values) + try: + res = openai.Embedding.create(input=list_to_encode, engine=MODEL) + embeds = [record['embedding'] for record in res['data']] + except: + embeds = [] + logging.error('Failed to embed entire list, embedding one by one...') + for i, text in enumerate(list_to_encode): + try: + res = openai.Embedding.create(input=[text], engine=MODEL) + embeds.append(res['data'][0]['embedding']) + except: + embeds.append([0] * 1536) + df["values"] = embeds + meta = DatasetMetadata.empty() + meta.dense_model.dimension = len(embeds[0]) + ds = Dataset.from_pandas(df, meta) + logging.info('Done') + + api_key = get_settings().pinecone.api_key + environment = get_settings().pinecone.environment + if not upsert: + logging.info('Creating index from scratch...') + ds.to_pinecone_index(self.index_name, api_key=api_key, environment=environment) + else: + logging.info('Upserting index...') + namespace = "" + batch_size: int = 100 + concurrency: int = 10 + pinecone.init(api_key=api_key, environment=environment) + ds._upsert_to_index(self.index_name, namespace, batch_size, concurrency) + logging.info('Done') + + +class IssueLevel(str, Enum): + ISSUE = "issue" + COMMENT = "comment" + + +class Metadata(BaseModel): + repo: str + username: str = Field(default="@codium") + created_at: str = Field(default="01-01-1970 00:00:00.00000") + level: IssueLevel = Field(default=IssueLevel.ISSUE) + + class Config: + use_enum_values = True + + +class Record(BaseModel): + id: str + text: str + metadata: Metadata + + +class Corpus(BaseModel): + documents: List[Record] = Field(default=[]) + + def append(self, r: Record): + self.documents.append(r) diff --git a/pr_agent/tools/pr_update_changelog.py b/pr_agent/tools/pr_update_changelog.py index 1ec62709..547ce84b 100644 --- a/pr_agent/tools/pr_update_changelog.py +++ b/pr_agent/tools/pr_update_changelog.py @@ -46,7 +46,7 @@ class PRUpdateChangelog: get_settings().pr_update_changelog_prompt.user) async def run(self): - assert type(self.git_provider) == GithubProvider, "Currently only Github is supported" + # assert type(self.git_provider) == GithubProvider, "Currently only Github is supported" logging.info('Updating the changelog...') if get_settings().config.publish_output: diff --git a/requirements.txt b/requirements.txt index 99efa846..8791a115 100644 --- a/requirements.txt +++ b/requirements.txt @@ -7,15 +7,17 @@ Jinja2==3.1.2 tiktoken==0.4.0 uvicorn==0.22.0 python-gitlab==3.15.0 -pytest~=7.4.0 -aiohttp~=3.8.4 +pytest==7.4.0 +aiohttp==3.8.4 atlassian-python-api==3.39.0 -GitPython~=3.1.32 +GitPython==3.1.32 PyYAML==6.0 starlette-context==0.3.6 -litellm~=0.1.504 -boto3~=1.28.25 +litellm~=0.1.574 +boto3==1.28.25 google-cloud-storage==2.10.0 ujson==5.8.0 azure-devops==7.1.0b3 -msrest==0.7.1 \ No newline at end of file +msrest==0.7.1 +pinecone-client +pinecone-datasets @ git+https://github.com/mrT23/pinecone-datasets.git@main \ No newline at end of file diff --git a/tests/unittest/test_codecommit_client.py b/tests/unittest/test_codecommit_client.py index 5d09bdd1..0aa1ffa6 100644 --- a/tests/unittest/test_codecommit_client.py +++ b/tests/unittest/test_codecommit_client.py @@ -125,7 +125,7 @@ class TestCodeCommitProvider: } } - pr = api.get_pr(321) + pr = api.get_pr("my_test_repo", 321) assert pr.title == "My PR" assert pr.description == "My PR description"