Joseph Pollack

Tonic

AI & ML interests

🤖Making robots to help people learn things quicker 👩🏻‍🚀🚀

Articles

Organizations

Posts 33

view post
Post
306
🙋🏻‍♂️ Hey there folks ,

🦎Salamandra release by @mvillegas and team
@BSC_CNS https://huggingface.co/BSC-LT is absolutely impressive so far !

perhaps the largest single training dataset of high quality text to date of 7.8 trillion tokens in 35 European languages and code.

the best part : the data was correctly licenced so it's actually future-proof!

the completions model is really creative and instruct fine tuned version is very good also.

now you can use such models for multi-lingual enterprise applications with further finetunes , long response generation, structured outputs (coding) also works.

check out 👇🏻
the collection : BSC-LT/salamandra-66fc171485944df79469043a
the repo : https://github.com/langtech-bsc/salamandra
7B-Instruct demo : Tonic/Salamandra-7B
view post
Post
1397
@mlabonne hey there 🙋🏻‍♂️ I kinda got obsessed with your great model , and i found the endpoint for it in lambda labs, but basically i got rate limited / banned for trying to make my DPO dataset project, i was wondering if you all had an open ai compatible solution for me to make a great "thinking" sft + dpo dataset with all the splits 🙏🏻🙏🏻 kinda desparate , it's true , but was looking forward to a nice write ups 🚀🚀🚀