nenad1002 commited on
Commit
a8808ab
1 Parent(s): e84ae8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -72,9 +72,9 @@ The dataset was generated by crawling the https://quantum-journal.org/ site, and
72
 
73
  ### Training Procedure
74
 
75
- Various training procedures were explored alongside multiple models, however, all of them were parameter efficient.
76
 
77
- Over time, several models and fine-tuning approaches were tested as the base model. The best accuracy was achieved with [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) and qLoRA, but the training duration was extensive, and optimizing hyperparameters proved to be highly challenging.
78
 
79
  Other base models were also tested: [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), [Meta-Llama/Llama-2-7b-chat-hf](Meta-Llama/Llama-2-7b-chat-hf), and the base model of this experiment.
80
 
@@ -137,6 +137,7 @@ The table below shows the best evaluation cross-entropy (across all params) for
137
  | **qLORA (for 8b model)**| 0.5471 | |
138
  | **(LO)ReFT**| 0.4824 | |
139
 
 
140
 
141
  #### Metrics
142
 
 
72
 
73
  ### Training Procedure
74
 
75
+ Various training procedures were explored alongside multiple models, however, all of them were parameter efficient. The general idea was to freeze most of the original model's parameters and only allow a small subset of parameters to be trainable.
76
 
77
+ Over time, several base models and fine-tuning approaches were tested. The best accuracy was achieved with [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) and qLoRA, but the training duration was extensive, and optimizing hyperparameters proved to be highly challenging.
78
 
79
  Other base models were also tested: [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), [Meta-Llama/Llama-2-7b-chat-hf](Meta-Llama/Llama-2-7b-chat-hf), and the base model of this experiment.
80
 
 
137
  | **qLORA (for 8b model)**| 0.5471 | |
138
  | **(LO)ReFT**| 0.4824 | |
139
 
140
+ The loss mask was applied during training, but it wasn't particularly useful since the model doesn't involve function calling or external data fetching.
141
 
142
  #### Metrics
143