Update README.md
Browse files
README.md
CHANGED
@@ -114,13 +114,11 @@ This model is [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0
|
|
114 |
|
115 |
The model was fine-tuned using [Rank-Stabilized LoRA](https://huggingface.co/blog/damjan-k/rslora) and the [LongAlpaca-12K](Yukang/LongAlpaca-12k) dataset. I hope to continue extending the context in future versions and then apply the same methods to my [upscaled versions of OpenChat-3.5](https://huggingface.co/collections/Pretergeek/openchat-35-0106-with-additional-layers-66a8d3262c7c3ebdd7783a29) that were created using Block Expansion instead of Depth UP Scaling.
|
116 |
|
117 |
-
After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%.
|
118 |
|
119 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
120 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Pretergeek__OpenChat-3.5-0106_32K-PoSE)
|
121 |
|
122 |
-
**These may or may not be correct, local benchmarks show performance close or identical to the original model. I will be redoing them locally following the same setup from the leaderboard. I would appreciate if someone else interested would also test and give me feedback.**
|
123 |
-
|
124 |
| Metric |Value|
|
125 |
|-------------------|----:|
|
126 |
|Avg. |12.70|
|
|
|
114 |
|
115 |
The model was fine-tuned using [Rank-Stabilized LoRA](https://huggingface.co/blog/damjan-k/rslora) and the [LongAlpaca-12K](Yukang/LongAlpaca-12k) dataset. I hope to continue extending the context in future versions and then apply the same methods to my [upscaled versions of OpenChat-3.5](https://huggingface.co/collections/Pretergeek/openchat-35-0106-with-additional-layers-66a8d3262c7c3ebdd7783a29) that were created using Block Expansion instead of Depth UP Scaling.
|
116 |
|
117 |
+
After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%. Below you can also find the results of the Open LLM Leaderboard evaluations and I am a bit disappointed with those. The model ended up with a significant reduction in performance compared to the original model in all but one test (MUSR). I expected it to do better than the original model on MUSR since that test benefits from long context understanding but I didn't expect such a negative impact on the other tasks. Anyway, I will be addressing this on a future version, probably by using a pre-training dataset instead of a fine-tuning dataset so that upstream task are less affected. I used the LongAlpaca-12K dataset because it is small and I have limited computational resources but I might have to try a larger dataset for the next attempt. If you would like to help me, there are links on the top of the model card for my Patreon and Ko-Fi.
|
118 |
|
119 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
120 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Pretergeek__OpenChat-3.5-0106_32K-PoSE)
|
121 |
|
|
|
|
|
122 |
| Metric |Value|
|
123 |
|-------------------|----:|
|
124 |
|Avg. |12.70|
|