4 10 8

Jorge De Corte PRO

JorgeDeC

AI & ML interests

None yet

Recent Activity

liked a model 5 days ago

mistralai/Mistral-Small-24B-Instruct-2501

liked a model 2 months ago

utter-project/EuroLLM-9B-Instruct

authored a paper 5 months ago

SoccerNet 2022 Challenges Results

View all activity

Organizations

JorgeDeC's activity

liked a model 5 days ago

mistralai/Mistral-Small-24B-Instruct-2501

Text Generation • Updated 2 days ago • 18.4k • • 564

liked a model 2 months ago

utter-project/EuroLLM-9B-Instruct

Text Generation • Updated Dec 9, 2024 • 18.5k • 140

authored a paper 5 months ago

SoccerNet 2022 Challenges Results

Paper • 2210.02365 • Published Oct 5, 2022

liked a Space 7 months ago

12.4k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

updated a collection 8 months ago

Paper reading list

Collection

12 items • Updated Jun 19, 2024

upvoted a paper 8 months ago

From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

Paper • 2406.12824 • Published Jun 18, 2024 • 21

liked 2 models 8 months ago

ReBatch/Reynaerde-7B-Instruct

Text Generation • Updated Jun 6, 2024 • 65 • 3

ReBatch/Reynaerde-7B-Chat

Updated Jun 7, 2024 • 6

updated a Space 9 months ago

Llama 3 8B dutch

🚀

New activity in huggingchat/chat-ui-template 9 months ago

Create INCLUDE_DB

#9 opened 9 months ago by

JorgeDeC

New activity in ReBatch/Llama-3-8B-dutch 9 months ago

Chat template during finetuning?

#2 opened 9 months ago by

wvangils

New activity in ReBatch/Llama-3-8B-dutch 10 months ago

License

#1 opened 10 months ago by

CorporateVero

updated a model 10 months ago

ReBatch/Llama-3-8B-dutch

Text Generation • Updated Apr 25, 2024 • 214 • 11

liked 3 models 10 months ago

replied to BramVanroy's post 10 months ago

A QLORA and ORPO finetune on your ultrafeedback dataset.
It defaults now more to Dutch, even when asking questions in English (sometimes :) )

https://huggingface.co/ReBatch/Llama-3-8B-dutch

I am surprised there is a (small) improvement on dutch_social and hellaswag with only 200k examples for one epoch. All other benchmarks saw a drop, will have to investigate that.

replied to BramVanroy's post 10 months ago

Great, thank you very much!
We were in the process of translating the original ultrachat en ultrafeedback dataset to Dutch ourselves using permissible models for commercial use.

But now we don't have to. Looking forward to using this!

reacted to BramVanroy's post with 🔥 10 months ago

Post

2291

🥳 New license for datasets: Apache 2.0!

I have been struggling mentally for many months now with the OpenAI terms of use that indicate that their model outputs cannot be used to build "competing models". This leads to many questions:

- what is the definition of competing? Is it the same as "commercial"?
- since this is part of the terms of use between OpenAI and the API user, can a third party still use the generated dataset to build competing models?
- are such restrictions even legal in the first place?

Trying to "follow the rules" as much as possible despite wanting to be as open as possible, I kept releasing my datasets under non-commercial licenses (which are too restrictive anyhow - nothing should prevent you from using the data in non-LM commercial settings), just like models trained on these datasets. This has put me at a competitive disadvantage compared to creators who do not follow the same approach and release their data/models on apache 2.0 despite the OpenAI "restrictions". Moreover, I fear (https://twitter.com/BramVanroy/status/1780220420316164246) that my approach blocks adaptation of my data/models for (commercial) applications/integrations.

Thankfully @Rijgersberg noted that these OpenAI terms of use are NOT explicit in the Azure OpenAI API (https://twitter.com/E_Rijgersberg/status/1780308971762450725). Since my latest datasets were created via Azure, this comes as a relief. As far as I can tell after digging through Azure docs, this allows me to change all recent GPT4-generated datasets to apache 2.0! 🥳

- BramVanroy/ultrachat_200k_dutch
- BramVanroy/orca_dpo_pairs_dutch
- BramVanroy/ultra_feedback_dutch
- BramVanroy/ultra_feedback_dutch_cleaned
- BramVanroy/no_robots_dutch

I will have to mull over what I'll do for the older GPT3.5 datasets. What do you think that I should do?

9 replies

updated a model 10 months ago

JorgeDeC/mistral-nl-7b-sft-qlora

Updated Apr 1, 2024 • 4