What dataset is used?

#1
by PavoM - opened

Hello,
Can you share which dataset you used for fine-tuning?
Is it open source?

Hello,

As this is continued pretraining there is no fine-tuning dataset.

In model card you have brief description of dataset creation process, it is mix of synthetic data, some web data in HBS languages, but with the some preprocessing described also in model card.
One of the dataset is: draganjovanovich/airoboros-3.0-serbian

Soon there will be instruct version of this model, and we will share dataset used.

Sign up or log in to comment