mirror of
https://github.com/qodo-ai/pr-agent.git
synced 2025-07-08 06:40:39 +08:00
Update index.md
This commit is contained in:
@ -4,7 +4,7 @@ On coding tasks, the gap between open-source models and top closed-source models
|
|||||||
<br>
|
<br>
|
||||||
In practice, open-source models are unsuitable for most real-world code tasks, and require further fine-tuning to produce acceptable results.
|
In practice, open-source models are unsuitable for most real-world code tasks, and require further fine-tuning to produce acceptable results.
|
||||||
|
|
||||||
_PR-Agent fine-tuning benchmark_ aims to benchmark open-source models on their ability to be fine-tuned for a code task.
|
_PR-Agent fine-tuning benchmark_ aims to benchmark open-source models on their ability to be fine-tuned for a coding task.
|
||||||
Specifically, we chose to fine-tune open-source models on the task of analyzing a pull request, and providing useful feedback and code suggestions.
|
Specifically, we chose to fine-tune open-source models on the task of analyzing a pull request, and providing useful feedback and code suggestions.
|
||||||
|
|
||||||
Here are the results:
|
Here are the results:
|
||||||
@ -57,7 +57,7 @@ Our training dataset comprises 25,000 pull requests, aggregated from permissive
|
|||||||
|
|
||||||
On the raw data collected, we employed various automatic and manual cleaning techniques to ensure the outputs were of the highest quality, and suitable for instruct-tuning.
|
On the raw data collected, we employed various automatic and manual cleaning techniques to ensure the outputs were of the highest quality, and suitable for instruct-tuning.
|
||||||
|
|
||||||
Here are the prompts, and example outputs, used as input-output paris to fine-tune the models:
|
Here are the prompts, and example outputs, used as input-output pairs to fine-tune the models:
|
||||||
|
|
||||||
| Tool | Prompt | Example output |
|
| Tool | Prompt | Example output |
|
||||||
|----------|------------------------------------------------------------------------------------------------------------|----------------|
|
|----------|------------------------------------------------------------------------------------------------------------|----------------|
|
||||||
@ -75,7 +75,7 @@ Here are the prompts, and example outputs, used as input-output paris to fine-tu
|
|||||||
|
|
||||||
We experimented with three model as judges: `gpt-4-turbo-2024-04-09`, `gpt-4o`, and `claude-3-opus-20240229`. All three produced similar results, with the same ranking order. This strengthens the validity of our testing protocol.
|
We experimented with three model as judges: `gpt-4-turbo-2024-04-09`, `gpt-4o`, and `claude-3-opus-20240229`. All three produced similar results, with the same ranking order. This strengthens the validity of our testing protocol.
|
||||||
|
|
||||||
Here is an example for a judge model feedback:
|
Here is an example of a judge model feedback:
|
||||||
|
|
||||||
```
|
```
|
||||||
command: improve
|
command: improve
|
||||||
|
Reference in New Issue
Block a user