bigcode/self-oss-instruct-sc2-exec-filter-50k
Viewer
•
Updated
•
50.7k
•
181
•
92
Example of using distilabel to generate synthetic triplets data for fine-tuning a Sentence Transformer model
Note Input dataset for generating synthetic data. We use the `instruction` column as a starting point.
Note The dataset was generated from our pipeline. The `instruction` column from the input dataset becomes the anchor, alongside a generated positive and negative pair. This results in a triplets dataset we can use to train a Sentence Transformers model. You can find the code used here: https://github.com/davanstrien/awesome-synthetic-datasets
Note A fine-tuned Sentence Transformers model using the above dataset. You can see we get a nice bump in performance from minimal fine-tuning.