diff --git a/docs/docs/finetuning_benchmark/index.md b/docs/docs/finetuning_benchmark/index.md index 21217b33..b33deae3 100644 --- a/docs/docs/finetuning_benchmark/index.md +++ b/docs/docs/finetuning_benchmark/index.md @@ -11,116 +11,22 @@ Here are the results:

- - - - - Model Performance Table - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Model nameModel size [B]Better than gpt-4 rate, after fine-tuning [%]
DeepSeek 34B-instruct3440.7
DeepSeek 34B-base3438.2
Phind-34b3438
Granite-34B3437.6
Codestral-22B-v0.12232.7
QWEN-1.5-32B3229
CodeQwen1.5-7B735.4
Granite-8b-code-instruct834.2
CodeLlama-7b-hf731.8
Gemma-7B727.2
DeepSeek coder-7b-instruct726.8
Llama-3-8B-Instruct826.8
Mistral-7B-v0.1716.1
- - +| Model name | Model size [B] | Better than gpt-4 rate, after fine-tuning [%] | +|-----------------------------|----------------|----------------------------------------------| +| **DeepSeek 34B-instruct** | **34** | **40.7** | +| DeepSeek 34B-base | 34 | 38.2 | +| Phind-34b | 34 | 38 | +| Granite-34B | 34 | 37.6 | +| Codestral-22B-v0.1 | 22 | 32.7 | +| QWEN-1.5-32B | 32 | 29 | +| | | | +| **CodeQwen1.5-7B** | **7** | **35.4** | +| Granite-8b-code-instruct | 8 | 34.2 | +| CodeLlama-7b-hf | 7 | 31.8 | +| Gemma-7B | 7 | 27.2 | +| DeepSeek coder-7b-instruct | 7 | 26.8 | +| Llama-3-8B-Instruct | 8 | 26.8 | +| Mistral-7B-v0.1 | 7 | 16.1 |
@@ -191,4 +97,4 @@ why: | are practical improvements. In contrast, Response 2 focuses more on general advice and less actionable suggestions, such as changing variable names and adding comments, which are less critical for immediate code improvement." -``` \ No newline at end of file +```