Compare commits

..

23 Commits

Author SHA1 Message Date
fa41ff5736 run_from_scratch 2023-10-19 17:28:16 +03:00
adf02bdf5b run_from_scratch 2023-10-19 17:19:11 +03:00
c6495642c3 update github provider for similar issue 2023-10-19 12:58:45 +05:30
7d9885f2b7 update github provider for similar issue 2023-10-19 11:09:14 +05:30
bcb1288ed5 Merge branch 'main' of https://github.com/Codium-ai/pr-agent into bitbucket_similar_issue_feature 2023-10-19 10:53:36 +05:30
ec77d5ff07 similar_issue feature is working for github and bitbucket 2023-10-19 10:50:19 +05:30
733ed907de similar_issue feature is working for github and bitbucket 2023-10-19 10:41:50 +05:30
954727ad67 Merge pull request #386 from Codium-ai/ok/fix_bitbucket_pipeline
Refactor Bitbucket Pipeline Integration and Update Documentation
2023-10-18 16:45:26 +03:00
ff04d459d7 Update Bitbucket Pipeline instructions in INSTALL.md, remove redundant functionality 2023-10-18 15:46:43 +03:00
88ca501c0c Merge pull request #377 from zmeir/zmeir-review_incremental_detect_header
Get previous incremental review
2023-10-18 00:30:42 +03:00
fe284a8f91 Merge pull request #382 from Codium-ai/tr/similar_issue_fix
Enhancements and Error Handling in Similar Issue Tool
2023-10-17 09:49:35 -07:00
d41fe0cf79 comment 2023-10-17 19:45:04 +03:00
f689b96513 Merge branch 'tr/similar_issue_fix' of https://github.com/Codium-ai/pr-agent into bitbucket_similar_issue_feature 2023-10-17 14:13:09 +05:30
a77bb2c482 Merge branch 'main' of https://github.com/Codium-ai/pr-agent into bitbucket_similar_issue_feature 2023-10-17 14:11:04 +05:30
d5c098de73 another protection 2023-10-17 10:21:05 +03:00
9f5c0daa8e protection 2023-10-17 09:43:48 +03:00
4cc9ab5bc6 bitbucket similar issue 2023-10-17 11:32:37 +05:30
bce2262d4e Merge pull request #381 from moccajoghurt/feature-allow-custom-urls
Support Custom Domain URLs for Azure DevOps Integration
2023-10-16 22:38:27 -07:00
e6f1e0520a remove azure.com url restriction 2023-10-16 20:38:14 +02:00
d8de89ae33 Get previous incremental review
When getting the last commit in `/review -i` consider also the last __incremental__ review, not just the last __full__ review

Full disclosure I'm not really sure the `/review -i` feature work very well - I might be wrong but it seemed like the actual review in fact addressed all the changes in the PR, and not just the ones from the last review (even though it adds a link to the commit of the last review).  
I think the commit list gathered in `/review -i` doesn't propagate the actual list the reviewer uses. Again, I might be wrong, just took a brief glance at it.
2023-10-16 16:37:10 +03:00
428c38e3d9 Merge pull request #376 from Codium-ai/feature/better_logger
Refactor logging system to use custom logger across the codebase
2023-10-16 16:32:27 +03:00
91afd29aef Merge branch 'main' of https://github.com/Codium-ai/pr-agent into bitbucket_similar_issue_feature 2023-10-04 10:01:40 +05:30
7705963916 Merge branch 'main' of https://github.com/Codium-ai/pr-agent into bitbucket_similar_issue_feature 2023-09-22 11:01:41 +05:30
9 changed files with 226 additions and 133 deletions

View File

@ -1,18 +0,0 @@
FROM python:3.10 as base
ENV OPENAI_API_KEY=${OPENAI_API_KEY} \
BITBUCKET_BEARER_TOKEN=${BITBUCKET_BEARER_TOKEN} \
BITBUCKET_PR_ID=${BITBUCKET_PR_ID} \
BITBUCKET_REPO_SLUG=${BITBUCKET_REPO_SLUG} \
BITBUCKET_WORKSPACE=${BITBUCKET_WORKSPACE}
WORKDIR /app
ADD pyproject.toml .
ADD requirements.txt .
RUN pip install . && rm pyproject.toml requirements.txt
ENV PYTHONPATH=/app
ADD pr_agent pr_agent
ADD bitbucket_pipeline/entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -375,59 +375,28 @@ In the "Trigger" section, check the comments and merge request events
### Method 9: Run as a Bitbucket Pipeline
You can use our pre-build Bitbucket-Pipeline docker image to run as Bitbucket-Pipeline.
You can use the Bitbucket Pipeline system to run PR-Agent on every pull request open or update.
1. Add the following file in your repository bitbucket_pipelines.yml
```yaml
pipelines:
pipelines:
pull-requests:
'**':
- step:
name: PR Agent Pipeline
caches:
- pip
image: python:3.8
name: PR Agent Review
image: python:3.10
services:
- docker
script:
- git clone https://github.com/Codium-ai/pr-agent.git
- cd pr-agent
- docker build -t bitbucket_runner:latest -f Dockerfile.bitbucket_pipeline .
- docker run -e OPENAI_API_KEY=$OPENAI_API_KEY -e BITBUCKET_BEARER_TOKEN=$BITBUCKET_BEARER_TOKEN -e BITBUCKET_PR_ID=$BITBUCKET_PR_ID -e BITBUCKET_REPO_SLUG=$BITBUCKET_REPO_SLUG -e BITBUCKET_WORKSPACE=$BITBUCKET_WORKSPACE bitbucket_runner:latest
- docker run -e CONFIG.GIT_PROVIDER=bitbucket -e OPENAI.KEY=$OPENAI_API_KEY -e BITBUCKET.BEARER_TOKEN=$BITBUCKET_BEARER_TOKEN codiumai/pr-agent:latest --pr_url=https://bitbucket.org/$BITBUCKET_WORKSPACE/$BITBUCKET_REPO_SLUG/pull-requests/$BITBUCKET_PR_ID review
```
2. Add the following secret to your repository under Repository settings > Pipelines > Repository variables.
2. Add the following secure variables to your repository under Repository settings > Pipelines > Repository variables.
OPENAI_API_KEY: <your key>
BITBUCKET_BEARER_TOKEN: <your token>
3. To get BITBUCKET_BEARER_TOKEN follow these steps
So here is my step by step tutorial
i) Insert your workspace name instead of {workspace_name} and go to the following link in order to create an OAuth consumer.
https://bitbucket.org/{workspace_name}/workspace/settings/api
set callback URL to http://localhost:8976 (doesn't need to be a real server there)
select permissions: repository -> read
ii) use consumer's Key as a {client_id} and open the following URL in the browser
https://bitbucket.org/site/oauth2/authorize?client_id={client_id}&response_type=code
iii)
after you press "Grant access" in the browser it will redirect you to
http://localhost:8976?code=<CODE>
iv) use the code from the previous step and consumer's Key as a {client_id}, and consumer's Secret as {client_secret}
curl -X POST -u "{client_id}:{client_secret}" \
https://bitbucket.org/site/oauth2/access_token \
-d grant_type=authorization_code \
-d code={code} \
After completing this steps, you just to place this access token in the repository varibles.
You can get a Bitbucket token for your repository by following Repository Settings -> Security -> Access Tokens
=======

View File

@ -113,13 +113,13 @@ See the [Release notes](./RELEASE_NOTES.md) for updates on the latest changes.
| | ⮑ Extended | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: |
| | Reflect and Review | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: | :white_check_mark: |
| | Update CHANGELOG.md | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | |
| | Find similar issue | :white_check_mark: | | | | | |
| | Find similar issue | :white_check_mark: | | :white_check_mark: | | | |
| | Add Documentation | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: |
| | | | | | | |
| USAGE | CLI | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| | App / webhook | :white_check_mark: | :white_check_mark: | | | |
| | Tagging bot | :white_check_mark: | | | | |
| | Actions | :white_check_mark: | | | | |
| | Actions | :white_check_mark: | | :white_check_mark: | | |
| | Web server | | | | | | :white_check_mark: |
| | | | | | | |
| CORE | PR compression | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |

View File

@ -1,2 +0,0 @@
#!/bin/bash
python /app/pr_agent/servers/bitbucket_pipeline_runner.py

View File

@ -236,9 +236,6 @@ class AzureDevopsProvider:
def _parse_pr_url(pr_url: str) -> Tuple[str, int]:
parsed_url = urlparse(pr_url)
if 'azure.com' not in parsed_url.netloc:
raise ValueError("The provided URL is not a valid Azure DevOps URL")
path_parts = parsed_url.path.strip('/').split('/')
if len(path_parts) < 6 or path_parts[4] != 'pullrequest':

View File

@ -10,12 +10,12 @@ from ..algo.pr_processing import find_line_number_of_relevant_line_in_file
from ..config_loader import get_settings
from ..log import get_logger
from .git_provider import FilePatchInfo, GitProvider
import ast
class BitbucketProvider(GitProvider):
def __init__(
self, pr_url: Optional[str] = None, incremental: Optional[bool] = False
):
self, pr_url: Optional[str] = None, incremental: Optional[bool] = False):
s = requests.Session()
try:
bearer = context.get("bitbucket_bearer_token", None)
@ -32,12 +32,15 @@ class BitbucketProvider(GitProvider):
self.repo = None
self.pr_num = None
self.pr = None
self.feature = None
self.issue_num = None
self.issue_name = None
self.temp_comments = []
self.incremental = incremental
if pr_url:
if pr_url and 'pull' in pr_url:
self.set_pr(pr_url)
self.bitbucket_comment_api_url = self.pr._BitbucketBase__data["links"]["comments"]["href"]
self.bitbucket_pull_request_api_url = self.pr._BitbucketBase__data["links"]['self']['href']
self.bitbucket_comment_api_url = self.pr._BitbucketBase__data["links"]["comments"]["href"]
self.bitbucket_pull_request_api_url = self.pr._BitbucketBase__data["links"]['self']['href']
def get_repo_settings(self):
try:
@ -228,6 +231,27 @@ class BitbucketProvider(GitProvider):
raise ValueError("Unable to convert PR number to integer") from e
return workspace_slug, repo_slug, pr_number
@staticmethod
def _parse_issue_url(issue_url: str) -> Tuple[str, int]:
parsed_url = urlparse(issue_url)
if "bitbucket.org" not in parsed_url.netloc:
raise ValueError("The provided URL is not a valid Bitbucket URL")
path_parts = parsed_url.path.strip('/').split('/')
if len(path_parts) < 5 or path_parts[2] != "issues":
raise ValueError("The provided URL does not appear to be a Bitbucket issue URL")
workspace_slug = path_parts[0]
repo_slug = path_parts[1]
try:
issue_number = int(path_parts[3])
except ValueError as e:
raise ValueError("Unable to convert issue number to integer") from e
return workspace_slug, repo_slug, issue_number
def _get_repo(self):
if self.repo is None:
@ -263,3 +287,81 @@ class BitbucketProvider(GitProvider):
# bitbucket does not support labels
def get_labels(self):
pass
def get_issue(self, workspace_slug, repo_name, original_issue_number):
issue = self.bitbucket_client.repositories.get(workspace_slug, repo_name).issues.get(original_issue_number)
return issue
def get_issue_url(self, issue):
return issue._BitbucketBase__data['links']['html']['href']
def get_issue_body(self, issue):
return issue.content['raw']
def get_issue_number(self, issue):
return issue.id
def get_issue_comment_body(self, comment):
return comment['content']['raw']
def get_issue_comment_user(self, comment):
return comment['user']['display_name']
def get_issue_created_at(self, issue):
return str(issue.created_on)
def get_username(self, issue, workspace_slug):
return workspace_slug
def get_repo_issues(self, repo_obj):
return repo_obj._Repository__issues.each()
def get_issues_comments(self, workspace_slug, repo_name, original_issue_number):
import requests
url = f"https://api.bitbucket.org/2.0/repositories/{workspace_slug}/{repo_name}/issues/{original_issue_number}/comments"
payload = {}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
return response.json()['values']
def create_issue_comment(self, similar_issues_str, workspace_slug, repo_name, original_issue_number):
url = f"https://api.bitbucket.org/2.0/repositories/{workspace_slug}/{repo_name}/issues/{original_issue_number}/comments"
payload = json.dumps({
"content": {
"raw": similar_issues_str
}
})
headers = {
'Authorization': f'Bearer {get_settings().get("BITBUCKET.BEARER_TOKEN", None)}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
def get_repo_obj(self, workspace_slug, repo_name):
return self.bitbucket_client.repositories.get(workspace_slug, repo_name)
def get_repo_name_for_indexing(self, repo_obj):
return repo_obj._BitbucketBase__data['full_name'].lower().replace('/', '-').replace('_/', '-')
def check_if_issue_pull_request(self, issue):
return False
def get_issue_numbers(self, issue):
list_of_issue_numbers = []
for issue in issue:
list_of_issue_numbers.append(issue.id)
return str(list_of_issue_numbers)
def get_issue_numbers_from_list(self, issues):
# convert str to list'
int_list = ast.literal_eval(issues)
int_list = [int(x) for x in int_list]
for issue_number in int_list:
return issue_number

View File

@ -77,7 +77,7 @@ class GithubProvider(GitProvider):
self.previous_review = None
self.comments = list(self.pr.get_issue_comments())
for index in range(len(self.comments) - 1, -1, -1):
if self.comments[index].body.startswith("## PR Analysis"):
if self.comments[index].body.startswith("## PR Analysis") or self.comments[index].body.startswith("## Incremental PR Review"):
self.previous_review = self.comments[index]
break
@ -241,7 +241,7 @@ class GithubProvider(GitProvider):
self.github_user_id = self.github_client.get_user().raw_data['login']
except Exception as e:
self.github_user_id = ""
# logging.exception(f"Failed to get user id, error: {e}")
# get_logger().exception(f"Failed to get user id, error: {e}")
return self.github_user_id
def get_notifications(self, since: datetime):
@ -335,8 +335,9 @@ class GithubProvider(GitProvider):
issue_number = int(path_parts[3])
except ValueError as e:
raise ValueError("Unable to convert issue number to integer") from e
workspace_slug = None
return repo_name, issue_number
return workspace_slug, repo_name, issue_number
def _get_github_client(self):
deployment_type = get_settings().get("GITHUB.DEPLOYMENT_TYPE", "user")
@ -453,3 +454,62 @@ class GithubProvider(GitProvider):
return pr_id
except:
return ""
def get_repo_issues(self, repo_obj):
return list(repo_obj.get_issues(state='all'))
def get_issues_comments(self, workspace_slug, repo_name, original_issue_number):
return self.repo_obj.get_issue(original_issue_number)
def get_issue_url(self, issue):
return issue.html_url
def create_issue_comment(self, similar_issues_str, workspace_slug, repo_name, original_issue_number):
try:
issue = self.repo_obj.get_issue(original_issue_number)
issue.create_comment(similar_issues_str)
except Exception as e:
get_logger().exception(f"Failed to create issue comment, error: {e}")
def get_issue_body(self, issue):
return issue.body
def get_issue_number(self, issue):
return issue.number
def get_issues_comments(self, workspace_slug, repo_name, original_issue_number):
issue = self.repo_obj.get_issue(original_issue_number)
return list(issue.get_comments())
def get_issue_body(self, issue):
return issue.body
def get_username(self, issue, workspace_slug):
return issue.user.login
def get_issue_created_at(self, issue):
return str(issue.created_at)
def get_issue_comment_body(self, comment):
return comment.body
def get_issue(self, workspace_slug, repo_name, original_issue_number):
return self.repo_obj.get_issue(int(original_issue_number))
def get_repo_obj(self, workspace_slug, repo_name):
return self.github_client.get_repo(repo_name)
def get_repo_name_for_indexing(self, repo_obj):
return repo_obj.full_name.lower().replace('/', '-').replace('_/', '-')
def check_if_issue_pull_request(self, issue):
if issue.pull_request:
return True
return False
def get_issue_numbers(self, issues_list):
return str([issue.number for issue in issues_list])
def get_issue_numbers_from_list(self, r):
return int(r.split('.')[0].split('_')[-1])

View File

@ -1,34 +0,0 @@
import os
from pr_agent.agent.pr_agent import PRAgent
from pr_agent.config_loader import get_settings
from pr_agent.tools.pr_reviewer import PRReviewer
import asyncio
async def run_action():
try:
pull_request_id = os.environ.get("BITBUCKET_PR_ID", '')
slug = os.environ.get("BITBUCKET_REPO_SLUG", '')
workspace = os.environ.get("BITBUCKET_WORKSPACE", '')
bearer_token = os.environ.get('BITBUCKET_BEARER_TOKEN', None)
OPENAI_KEY = os.environ.get('OPENAI_API_KEY') or os.environ.get('OPENAI.KEY')
OPENAI_ORG = os.environ.get('OPENAI_ORG') or os.environ.get('OPENAI.ORG')
# Check if required environment variables are set
if not bearer_token:
print("BITBUCKET_BEARER_TOKEN not set")
return
if not OPENAI_KEY:
print("OPENAI_KEY not set")
return
# Set the environment variables in the settings
get_settings().set("BITBUCKET.BEARER_TOKEN", bearer_token)
get_settings().set("OPENAI.KEY", OPENAI_KEY)
if OPENAI_ORG:
get_settings().set("OPENAI.ORG", OPENAI_ORG)
if pull_request_id and slug and workspace:
pr_url = f"https://bitbucket.org/{workspace}/{slug}/pull-requests/{pull_request_id}"
await PRReviewer(pr_url).run()
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
asyncio.run(run_action())

View File

@ -1,3 +1,4 @@
import time
from enum import Enum
from typing import List
@ -18,19 +19,17 @@ MODEL = "text-embedding-ada-002"
class PRSimilarIssue:
def __init__(self, issue_url: str, args: list = None):
if get_settings().config.git_provider != "github":
raise Exception("Only github is supported for similar issue tool")
self.cli_mode = get_settings().CONFIG.CLI_MODE
self.max_issues_to_scan = get_settings().pr_similar_issue.max_issues_to_scan
self.issue_url = issue_url
self.git_provider = get_git_provider()()
repo_name, issue_number = self.git_provider._parse_issue_url(issue_url.split('=')[-1])
self.git_provider.repo = repo_name
self.git_provider.repo_obj = self.git_provider.github_client.get_repo(repo_name)
self.workspace_slug, self.repo_name, self.issue_number = self.git_provider._parse_issue_url(issue_url.split('=')[-1])
self.git_provider.repo = self.repo_name
self.git_provider.repo_obj = self.git_provider.get_repo_obj(self.workspace_slug, self.repo_name)
self.token_handler = TokenHandler()
repo_obj = self.git_provider.repo_obj
repo_name_for_index = self.repo_name_for_index = repo_obj.full_name.lower().replace('/', '-').replace('_/', '-')
repo_name_for_index = self.repo_name_for_index = self.git_provider.get_repo_name_for_indexing(repo_obj)
index_name = self.index_name = "codium-ai-pr-agent-issues"
# assuming pinecone api key and environment are set in secrets file
@ -39,13 +38,20 @@ class PRSimilarIssue:
environment = get_settings().pinecone.environment
except Exception:
if not self.cli_mode:
repo_name, original_issue_number = self.git_provider._parse_issue_url(self.issue_url.split('=')[-1])
issue_main = self.git_provider.repo_obj.get_issue(original_issue_number)
workspace_slug, repo_name, original_issue_number = self.git_provider._parse_issue_url(self.issue_url.split('=')[-1])
issue_main = self.git_provider.get_issue(workspace_slug, repo_name, original_issue_number)
issue_main.create_comment("Please set pinecone api key and environment in secrets file")
raise Exception("Please set pinecone api key and environment in secrets file")
# check if index exists, and if repo is already indexed
run_from_scratch = False
if run_from_scratch: # for debugging
pinecone.init(api_key=api_key, environment=environment)
if index_name in pinecone.list_indexes():
get_logger().info('Removing index...')
pinecone.delete_index(index_name)
get_logger().info('Done')
upsert = True
pinecone.init(api_key=api_key, environment=environment)
if not index_name in pinecone.list_indexes():
@ -64,19 +70,20 @@ class PRSimilarIssue:
get_logger().info('Indexing the entire repo...')
get_logger().info('Getting issues...')
issues = list(repo_obj.get_issues(state='all'))
issues = self.git_provider.get_repo_issues(repo_obj)
get_logger().info('Done')
self._update_index_with_issues(issues, repo_name_for_index, upsert=upsert)
else: # update index if needed
pinecone_index = pinecone.Index(index_name=index_name)
issues_to_update = []
issues_paginated_list = repo_obj.get_issues(state='all')
issues_paginated_list = self.git_provider.get_repo_issues(repo_obj)
counter = 1
for issue in issues_paginated_list:
if issue.pull_request:
issue_pull_request = self.git_provider.check_if_issue_pull_request(issue)
if issue_pull_request:
continue
issue_str, comments, number = self._process_issue(issue)
issue_key = f"issue_{number}"
issue_key = f"issue_{number}"
id = issue_key + "." + "issue"
res = pinecone_index.fetch([id]).to_dict()
is_new_issue = True
@ -98,8 +105,8 @@ class PRSimilarIssue:
async def run(self):
get_logger().info('Getting issue...')
repo_name, original_issue_number = self.git_provider._parse_issue_url(self.issue_url.split('=')[-1])
issue_main = self.git_provider.repo_obj.get_issue(original_issue_number)
workspace_slug, repo_name, original_issue_number = self.git_provider._parse_issue_url(self.issue_url.split('=')[-1])
issue_main = self.git_provider.get_issue(workspace_slug, repo_name, original_issue_number)
issue_str, comments, number = self._process_issue(issue_main)
openai.api_key = get_settings().openai.key
get_logger().info('Done')
@ -116,7 +123,16 @@ class PRSimilarIssue:
relevant_comment_number_list = []
score_list = []
for r in res['matches']:
issue_number = int(r["id"].split('.')[0].split('_')[-1])
# skip example issue
if 'example_issue_' in r["id"]:
continue
try:
issue_id= r['id']
issue_number = self.git_provider.get_issue_numbers_from_list(issue_id)
except:
get_logger().debug(f"Failed to parse issue number from {r['id']}")
continue
if original_issue_number == issue_number:
continue
if issue_number not in relevant_issues_number_list:
@ -131,33 +147,32 @@ class PRSimilarIssue:
get_logger().info('Publishing response...')
similar_issues_str = "### Similar Issues\n___\n\n"
for i, issue_number_similar in enumerate(relevant_issues_number_list):
issue = self.git_provider.repo_obj.get_issue(issue_number_similar)
issue = self.git_provider.get_issue(workspace_slug, repo_name, issue_number_similar)
title = issue.title
url = issue.html_url
if relevant_comment_number_list[i] != -1:
url = list(issue.get_comments())[relevant_comment_number_list[i]].html_url
url = self.git_provider.get_issue_url(issue)
similar_issues_str += f"{i + 1}. **[{title}]({url})** (score={score_list[i]})\n\n"
if get_settings().config.publish_output:
response = issue_main.create_comment(similar_issues_str)
response = self.git_provider.create_issue_comment(similar_issues_str, workspace_slug, repo_name, original_issue_number)
get_logger().info(similar_issues_str)
get_logger().info('Done')
def _process_issue(self, issue):
header = issue.title
body = issue.body
number = issue.number
body = self.git_provider.get_issue_body(issue)
number = self.git_provider.get_issue_number(issue)
if get_settings().pr_similar_issue.skip_comments:
comments = []
else:
comments = list(issue.get_comments())
comments = self.git_provider.get_issues_comments(self.workspace_slug, self.repo_name, self.issue_number)
issue_str = f"Issue Header: \"{header}\"\n\nIssue Body:\n{body}"
return issue_str, comments, number
def _update_index_with_issues(self, issues_list, repo_name_for_index, upsert=False):
get_logger().info('Processing issues...')
corpus = Corpus()
issues = self.git_provider.get_issue_numbers(issues_list)
example_issue_record = Record(
id=f"example_issue_{repo_name_for_index}",
id=str(issues),
text="example_issue",
metadata=Metadata(repo=repo_name_for_index)
)
@ -165,7 +180,9 @@ class PRSimilarIssue:
counter = 0
for issue in issues_list:
if issue.pull_request:
issue_pull_request = self.git_provider.check_if_issue_pull_request(issue)
if issue_pull_request:
continue
counter += 1
@ -177,8 +194,8 @@ class PRSimilarIssue:
issue_str, comments, number = self._process_issue(issue)
issue_key = f"issue_{number}"
username = issue.user.login
created_at = str(issue.created_at)
username = self.git_provider.get_username(issue, self.workspace_slug)
created_at = self.git_provider.get_issue_created_at(issue)
if len(issue_str) < 8000 or \
self.token_handler.count_tokens(issue_str) < MAX_TOKENS[MODEL]: # fast reject first
issue_record = Record(
@ -192,7 +209,7 @@ class PRSimilarIssue:
corpus.append(issue_record)
if comments:
for j, comment in enumerate(comments):
comment_body = comment.body
comment_body = self.git_provider.get_issue_comment_body(comment)
num_words_comment = len(comment_body.split())
if num_words_comment < 10 or not isinstance(comment_body, str):
continue
@ -237,6 +254,7 @@ class PRSimilarIssue:
if not upsert:
get_logger().info('Creating index from scratch...')
ds.to_pinecone_index(self.index_name, api_key=api_key, environment=environment)
time.sleep(15) # wait for pinecone to finalize indexing before querying
else:
get_logger().info('Upserting index...')
namespace = ""
@ -244,6 +262,7 @@ class PRSimilarIssue:
concurrency: int = 10
pinecone.init(api_key=api_key, environment=environment)
ds._upsert_to_index(self.index_name, namespace, batch_size, concurrency)
time.sleep(5) # wait for pinecone to finalize upserting before querying
get_logger().info('Done')