Giorgio Nicoli's picture
1 7

Giorgio Nicoli

qJakc
·

AI & ML interests

None yet

Recent Activity

upvoted a collection 29 days ago
GLiClass
reacted to dvilasuero's post with 🤗 about 1 year ago
👋 Hi there! This is my very first post. I'll use it to share some old news: a math preference dataset for DPO! I created this dataset some time ago while we were developing distilabel (https://github.com/argilla-io/distilabel). Some days ago we found out people are actually using it! So I'll use this post to explain how I built it in case it's useful for the community. 1. I used distilabel's SelfInstruct-inspired task to generate instructions about different math topics. I curated the instructions with Argilla (on Spaces!). 2. Then I used a distilabel Pipeline to build a preference dataset using gpt3.5 as generator and gpt4 as labeller. If I recall correctly I used our JudgeLM implementation (see https://distilabel.argilla.io/latest/technical-reference/tasks/#judgelmtask) (see the screenshot with the dataset in the Argilla UI) 3. Then I just binarized into chosen, rejected pairs and voilà: https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo The funny thing is that I used this to do a second DPO run over Notus-7B. I hoped to see an improvement on math/reasoning skills but it actually improved in STEM and Humanities and did worse on Math 🤣 . In conclusion, this dataset was only a quick experiement. I'm happy to see the community found it useful. Data for DPO and fine-tuning are still a mystery, let's unveil these mysteries in 2024 together! Follow me for the most exciting datasets for LLMs (and maybe some great, small, efficient models). I plan to announce all Argilla open-source work here!
View all activity

Organizations

None yet