Update README.md
Browse files
README.md
CHANGED
@@ -91,6 +91,7 @@ After 3 to 4 epochs, the model began to overfit regardless of the strategies emp
|
|
91 |
|
92 |
Following an extensive grid search, supervised fine-tuning of [Llama 3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) with LoRA+ and the parameters mentioned below yielded the best training and evaluation cross-entropy.
|
93 |
I've chosen the size ratio between the matrices A and B of 8. The matrix A weights were initialized using the He method, while the matrix B values started with zero. Different Gaussian initialization of weights were also considered, but led to a suboptimal result. Since a custom optimizer was built here, I will share the optimizer code on my private GitHub account soon.
|
|
|
94 |
#### Preprocessing [optional]
|
95 |
|
96 |
[Coming soon]
|
@@ -122,7 +123,19 @@ Please see the graph below:
|
|
122 |
|
123 |
<img src="https://i.ibb.co/SB4gyQf/crossentropy.png" alt="Alt text" style="width:50%;"/>
|
124 |
|
125 |
-
The final evaluation cross-entropy ended around 0.4.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
126 |
|
127 |
#### Metrics
|
128 |
|
|
|
91 |
|
92 |
Following an extensive grid search, supervised fine-tuning of [Llama 3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) with LoRA+ and the parameters mentioned below yielded the best training and evaluation cross-entropy.
|
93 |
I've chosen the size ratio between the matrices A and B of 8. The matrix A weights were initialized using the He method, while the matrix B values started with zero. Different Gaussian initialization of weights were also considered, but led to a suboptimal result. Since a custom optimizer was built here, I will share the optimizer code on my private GitHub account soon.
|
94 |
+
|
95 |
#### Preprocessing [optional]
|
96 |
|
97 |
[Coming soon]
|
|
|
123 |
|
124 |
<img src="https://i.ibb.co/SB4gyQf/crossentropy.png" alt="Alt text" style="width:50%;"/>
|
125 |
|
126 |
+
The final evaluation cross-entropy ended around 0.4 for this model.
|
127 |
+
|
128 |
+
|
129 |
+
|
130 |
+
| | Loss |
|
131 |
+
|:------------------|:---------------------------|
|
132 |
+
| **LORA** | 0.4603 |
|
133 |
+
| **LORA+** | 0.4011 |
|
134 |
+
| **DORA**| 0.4182 |
|
135 |
+
| **qLORA (for 70b model)**| 0.3694 |
|
136 |
+
| **qLORA (for 8b model)**| 0.5471 |
|
137 |
+
| **(LO)ReFT**| 0.4824 |
|
138 |
+
|
139 |
|
140 |
#### Metrics
|
141 |
|