nenad1002 commited on
Commit
6790fcd
1 Parent(s): a1223bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -79,6 +79,11 @@ The dataset was generated by crawling the https://quantum-journal.org/ site, and
79
 
80
  Many training procedures were tried alongside with multiple models.
81
 
 
 
 
 
 
82
  After exensive grid search, supervised fine tuning of Llama 3.1-8B with LORA+ resulted in the best training and evaluation cross entropy.
83
 
84
  #### Preprocessing [optional]
 
79
 
80
  Many training procedures were tried alongside with multiple models.
81
 
82
+ Over the course of time multiple models and fine tuning approaches have been tried as the base model. The best performace was achieved with Lllama 3.1 70B Instruct and qLORA, but the model was very long to train, and finding the best hyperparameter would be too challenging.
83
+
84
+ The other two base models that were tries were the mistral 7B v0.1 base model, meta-llama/Llama-2-7b-chat-hf, and the base model of this model.
85
+
86
+ I've performed the grid search with several optimization techniques such as [LORA](https://arxiv.org/abs/2106.09685), [DORA](https://arxiv.org/abs/2402.09353), [LORA+](https://arxiv.org/abs/2402.12354), [REFT](https://arxiv.org/abs/2404.03592), and [qLORA](https://arxiv.org/abs/2305.14314)
87
  After exensive grid search, supervised fine tuning of Llama 3.1-8B with LORA+ resulted in the best training and evaluation cross entropy.
88
 
89
  #### Preprocessing [optional]