Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,11 @@ To prove the point I'm planning to create a few more finetunes like this, starti
|
|
16 |
5 epochs, LR=1e-05, batch=2, gradient accumulation 32 (i.e. trying to simulate batch 64), max_len=1024. Rank and Alpha both 128 targeting all modules. trained in bfloat16. Constant schedule, no warm-up.
|
17 |
Flash-Attention 2 turned off due to an issue with batching
|
18 |
|
|
|
|
|
|
|
|
|
|
|
19 |
Evals:
|
20 |
HumanEval score (2.4 p.p improvement to best Phind v2 score!) for the new prompt:
|
21 |
**{'pass@1': 0.7621951219512195}**
|
@@ -35,6 +40,10 @@ After several HumanEval tests and prompts Phind v2 was maximum able to score: 73
|
|
35 |
In the long term, I'm planning on experimenting with LIMA + DPO Fine-Tuning, but so far I noticed that LIMA datasets need to be both general and task-specific. The best result I got with around 30% of samples that were task specific.
|
36 |
https://huggingface.co/datasets/KrisPi/PythonTutor-Evol-1k-DPO-GPT4_vs_35
|
37 |
|
|
|
|
|
|
|
|
|
38 |
r=128,
|
39 |
lora_alpha=128,
|
40 |
target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj'],
|
|
|
16 |
5 epochs, LR=1e-05, batch=2, gradient accumulation 32 (i.e. trying to simulate batch 64), max_len=1024. Rank and Alpha both 128 targeting all modules. trained in bfloat16. Constant schedule, no warm-up.
|
17 |
Flash-Attention 2 turned off due to an issue with batching
|
18 |
|
19 |
+
Expected result:
|
20 |
+
New system prompt that will preference for using docstring under each function, use multiple functions even if it doesn't make sense, and comment on every line of the code, it should also greatly reduce explanations before and after code block.
|
21 |
+
As a result model will improve readability by Junior Python Developers and additionally do step-by-step reasoning by default to improve code & HumanEval results.
|
22 |
+
|
23 |
+
|
24 |
Evals:
|
25 |
HumanEval score (2.4 p.p improvement to best Phind v2 score!) for the new prompt:
|
26 |
**{'pass@1': 0.7621951219512195}**
|
|
|
40 |
In the long term, I'm planning on experimenting with LIMA + DPO Fine-Tuning, but so far I noticed that LIMA datasets need to be both general and task-specific. The best result I got with around 30% of samples that were task specific.
|
41 |
https://huggingface.co/datasets/KrisPi/PythonTutor-Evol-1k-DPO-GPT4_vs_35
|
42 |
|
43 |
+
```
|
44 |
+
### System Prompt\nYou are an intelligent assistant.\n\n### User Message\nTake a deep breath and think step by step, make sure to verify your solution will pass example test cases. Write in the most simple manner using mutiple functions, simple loops and if statements, do not compress code, the code will be read by other developer.\n{PROMPT}\n\n### Assistant\n
|
45 |
+
```
|
46 |
+
|
47 |
r=128,
|
48 |
lora_alpha=128,
|
49 |
target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj'],
|