No Name's picture

No Name

Ainonake

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago
RekaAI/reka-flash-3
reacted to tomaarsen's post with ā¤ļø 5 days ago
An assembly of 18 European companies, labs, and universities have banded together to launch šŸ‡ŖšŸ‡ŗ EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc. šŸ‡ŖšŸ‡ŗ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi 3ļøāƒ£ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion āž”ļø Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common. āš™ļø Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported. šŸ”„ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models šŸ“Š Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight. šŸ“ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code. Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release * https://huggingface.co/EuroBERT/EuroBERT-210m * https://huggingface.co/EuroBERT/EuroBERT-610m * https://huggingface.co/EuroBERT/EuroBERT-2.1B The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
View all activity

Organizations

None yet

Ainonake's activity

reacted to tomaarsen's post with ā¤ļø 5 days ago
view post
Post
6262
An assembly of 18 European companies, labs, and universities have banded together to launch šŸ‡ŖšŸ‡ŗ EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.

šŸ‡ŖšŸ‡ŗ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3ļøāƒ£ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
āž”ļø Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
āš™ļø Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
šŸ”„ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
šŸ“Š Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
šŸ“ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.

Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B

The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
  • 1 reply
Ā·
New activity in Undi95/MistralThinker-v1.1 9 days ago

This shit is fire

13
#2 opened 17 days ago by
Ainonake
New activity in Undi95/MistralThinker-v1.1 10 days ago
replied to Undi95's post 11 days ago
view reply

Then what if we do the same, but put whole conversation in first user input?

So it will be
System prompt
User: conversation history
Then ask R1 to generate thinking.

And the amount of messages in conversation history should be varied. Then, Bot reply will always contain thinking.

replied to Undi95's post 11 days ago
view reply

What do you think about doing part of the dataset with replies from some context?

E.g. we have e.g. 50% of data with thinking from first user answer, and some parts of dataset with

User,
Bot (no thinking),
User
Bot (no thinking),
User, N times,
Then ask R1 to think here and train on it. So the model will understand long context better.

reacted to Undi95's post with šŸ‘ 11 days ago
view post
Post
4704
Hi there!

If you want to create your own thinking model or do a better MistralThinker, I just uploaded my entire dataset made on Deepseek R1 and the axolotl config. (well I made them public)

Axolotl config : Undi95/MistralThinker-v1.1

The dataset : Undi95/R1-RP-ShareGPT3

You can also read all I did on those two discord screenshot from two days ago, I'm a little lazy to rewrite all kek.

Hope you will use them!
Ā·
New activity in yandex/YandexGPT-5-Lite-8B-pretrain 15 days ago

ollama?

2
#13 opened 17 days ago by
deniiiiiij

Translation

#22 opened 21 days ago by
Ainonake
New activity in ValueFX9507/Tifa-Deepsex-14b-CoT about 1 month ago

How to launch this?

1
#13 opened about 1 month ago by
Andrei321123
New activity in anthracite-org/magnum-v4-72b about 1 month ago