45 14 338

Smoliakov PRO

Yehor

https://t.me/doing_something

AI & ML interests

Speech-to-Text, Text-to-Speech, Voice over Internet Protocol

Recent Activity

published a Space 1 day ago

Yehor/question-classifier-demo

updated a Space 1 day ago

Yehor/question-classifier-demo

commented on an article 2 days ago

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

View all activity

Organizations

Yehor's activity

reacted to eliebak's post with 🔥 2 days ago

Post

1273

Google just dropped an exciting technical report for the brand-new Gemma3 model! 🚀 Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:

1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!

2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension

3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?

4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2

1 reply

reacted to tomaarsen's post with 👍 4 days ago

Post

6216

An assembly of 18 European companies, labs, and universities have banded together to launch 🇪🇺 EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.

🇪🇺 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3️⃣ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
➡️ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
⚙️ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
🔥 A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
📊 Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
📝 Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.

Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B

The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!

1 reply

replied to their post 4 days ago

Also, the Q&A dataset:

https://huggingface.co/datasets/ua-l/questions-with-answers

posted an update 4 days ago

Post

1439

Published some datasets for researchers in Ukrainian NLP from my project https://ua-lawyer.com (Q&A platform in Ukraine):

Datasets:
- ua-l/topics
- ua-l/topics-train-test
- ua-l/topics-text-label

Model:
- https://huggingface.co/ua-l/topics-classifier

Space:
- ua-l/topics-classifier-demo

1 reply

posted an update 10 days ago

Post

2845

Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.

Features:

- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.

Repository: https://github.com/egorsmkv/tts_uk

reacted to MohamedRashad's post with ❤️ 11 days ago

Post

3415

I think we have released the best Arabic model under 25B at least based on inceptionai/AraGen-Leaderboard

Yehia = ALLaM-AI/ALLaM-7B-Instruct-preview + GRPO

and its ranked number one model under the 25B parameter size mark.

Now, i said "i think" not "i am sure" because this model used the same metric of evaluation the AraGen developers use (the 3C3H) as a reward model to improve its responses and this sparks the question. Is this something good for users or is it another type of overfitting that we don't want ?

I don't know if this is a good thing or a bad thing but what i know is that you can try it from here:
Navid-AI/Yehia-7B-preview

or Download it for your personal experiments from here:
Navid-AI/Yehia-7B-preview

Ramadan Kareem 🌙

1 reply

replied to their post 11 days ago

Yes, you’re totally right. But I use this abbreviation for Ukrainian, not Ukraine.

replied to their post 12 days ago

Also, added a library:

https://github.com/egorsmkv/tts_uk

replied to their post 12 days ago

Google Colabs:

replied to their post 12 days ago

Also, it contains a Dockerfile. You can build own image and run it everywhere. See https://huggingface.co/spaces/Yehor/radtts-uk-vocos-demo/blob/main/example.py file to learn how to use it in the code.

posted an update 12 days ago

Post

622

Added Advanced options to RAD-TTS++ space, so you can synthesize Ukrainian voices precisely.

Space: Yehor/radtts-uk-vocos-demo

5 replies