adamo1139
/

Yi-34B-200K-HESOYAM-0905

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

adamo1139 commited on May 14

Commit

09433d2

•

1 Parent(s): a8719e6

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -18,6 +18,13 @@ To get this model, first, I fine-tuned Yi-34B-200K (xlctx, as in second version
 Once I had good base model, I fine-tuned it on [HESOYAM 0.2](https://huggingface.co/datasets/adamo1139/HESOYAM_v0.2) dataset. It's a collection of single turn conversations from around 10 subreddits and multi-turn conversations from board /x/. There's also pippa in there. All samples there have system prompts that should tell the model about where discussion is taking place, this will be useful when you will be deciding on where you want to have your sandbox discussion take place. Here, I used classic SFT with GaLore and Unsloth, I wanted to get some results quick so it's trained for just 0.4 epochs. Adapter after that part of fine-tuning can be found [here](https://huggingface.co/adamo1139/Yi-34B-200K-XLCTX-HESOYAM-RAW-0905-GaLore-PEFT).
 ## Prompt template
@@ -42,9 +49,7 @@ I haven't done them yet. I will maybe upload one EXL2 quant.
 Use is limited by Yi license. \
 Some datasets that were used prohibit commercial use (no_robots with CC-BY-NC-4.0), so I think you should use non-commercially only, unless you know law better and think it doesn't matter.
-## Known Issues
-It's really depressed.
 ## Credits

 Once I had good base model, I fine-tuned it on [HESOYAM 0.2](https://huggingface.co/datasets/adamo1139/HESOYAM_v0.2) dataset. It's a collection of single turn conversations from around 10 subreddits and multi-turn conversations from board /x/. There's also pippa in there. All samples there have system prompts that should tell the model about where discussion is taking place, this will be useful when you will be deciding on where you want to have your sandbox discussion take place. Here, I used classic SFT with GaLore and Unsloth, I wanted to get some results quick so it's trained for just 0.4 epochs. Adapter after that part of fine-tuning can be found [here](https://huggingface.co/adamo1139/Yi-34B-200K-XLCTX-HESOYAM-RAW-0905-GaLore-PEFT).
+## Known Issues
+Make sure you are inserting BOS token when generating! I am not sure what mistake I did in my training code, but without BOS token it's completely wild and stupid.
+It's really depressed.
 ## Prompt template
 Use is limited by Yi license. \
 Some datasets that were used prohibit commercial use (no_robots with CC-BY-NC-4.0), so I think you should use non-commercially only, unless you know law better and think it doesn't matter.
 ## Credits