KrisPi's picture
Update README.md
c7f00ea
---
license: llama2
---
This is Phind v2 QLoRa finetune using my PythonTutor LIMA dataset:
https://huggingface.co/datasets/KrisPi/PythonTutor-LIMA-Finetune
My shy attempt to democratize task-specific, cheap fine-tuning focused around LIMA-like datasets -everybody can afford to generate them (less than 20$) and everybody can finetune them (7 hours in total using 2x3090 GPU ~3$+5$ on vast.ai)
At the moment of publishing this adapter, there are already production-ready solutions for serving several LorA adapters. I honestly believe that the route of a reproducible, vast collection of adapters on the top of current SOTA models, will enable the open-source community to access GPT-4 level LLMs in the next 12 months.
My main inspirations for this were blazing fast implementation of multi-LORA in Exllamav2 backend, Jon's LMoE and Airoboros dataset, r/LocalLLaMA opinions around models based on LIMA finetunes, and of course the LIMA paper itself.
To prove the point I'm planning to create a few more finetunes like this, starting with the Airoboros "contextual" category for RAG solutions, adapters for React and DevOps YAML scripting.
5 epochs, LR=1e-05, batch=2, gradient accumulation 32 (i.e. trying to simulate batch 64), max_len=1024. Rank and Alpha both 128 targeting all modules. trained in bfloat16. Constant schedule, no warm-up.
Flash-Attention 2 turned off due to an issue with batching
Expected result:
New system prompt that will preference for using docstring under each function, use multiple functions even if it doesn't make sense, and comment on every line of the code, it should also greatly reduce explanations before and after code block.
As a result model will improve readability by Junior Python Developers and additionally do step-by-step reasoning by default to improve code & HumanEval results.
Evals:
HumanEval score (2.4 p.p improvement to best Phind v2 score!) for the new prompt:
**{'pass@1': 0.7621951219512195}**
**Base + Extra**
**{'pass@1': 0.7073170731707317}**
Base prompt (0.51 p.p improvement)
{'pass@1': 0.725609756097561}
Base + Extra
{'pass@1': 0.6585365853658537}
Phind v2 with Python Tutor custom prompt is only getting:
{'pass@1': 0.7073170731707317}
Base + Extra
{'pass@1': 0.6463414634146342}
After several HumanEval tests and prompts Phind v2 was maximum able to score: 73.78%
**All evals using Transformers 8bit**
In the long term, I'm planning on experimenting with LIMA + DPO Fine-Tuning, but so far I noticed that LIMA datasets need to be both general and task-specific. The best result I got with around 30% of samples that were task specific.
https://huggingface.co/datasets/KrisPi/PythonTutor-Evol-1k-DPO-GPT4_vs_35
```
### System Prompt\nYou are an intelligent assistant.\n\n### User Message\nTake a deep breath and think step by step, make sure to verify your solution will pass example test cases. Write in the most simple manner using mutiple functions, simple loops and if statements, do not compress code, the code will be read by other developer.\n{PROMPT}\n\n### Assistant\n
```
r=128,
lora_alpha=128,
target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj'],
lora_dropout=0.03,
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)