Smoliakov PRO

Yehor

AI & ML interests

Speech-to-Text, Text-to-Speech, Voice over Internet Protocol

Recent Activity

Organizations

Speech-UK initiative's profile picture MedVoice's profile picture Call Recognition System's profile picture UA-LAWYER's profile picture

Yehor's activity

reacted to eliebak's post with πŸ”₯ 2 days ago
view post
Post
1273
Google just dropped an exciting technical report for the brand-new Gemma3 model! πŸš€ Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:

1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!

2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension

3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?

4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2
  • 1 reply
Β·
reacted to tomaarsen's post with πŸ‘ 4 days ago
view post
Post
6216
An assembly of 18 European companies, labs, and universities have banded together to launch πŸ‡ͺπŸ‡Ί EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.

πŸ‡ͺπŸ‡Ί 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3️⃣ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
➑️ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
βš™οΈ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
πŸ”₯ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
πŸ“Š Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
πŸ“ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.

Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B

The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
  • 1 reply
Β·
replied to their post 4 days ago
posted an update 4 days ago
posted an update 10 days ago
view post
Post
2845
Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.

Features:

- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.

Repository: https://github.com/egorsmkv/tts_uk
reacted to MohamedRashad's post with ❀️ 11 days ago
view post
Post
3415
I think we have released the best Arabic model under 25B at least based on inceptionai/AraGen-Leaderboard

Yehia = ALLaM-AI/ALLaM-7B-Instruct-preview + GRPO

and its ranked number one model under the 25B parameter size mark.

Now, i said "i think" not "i am sure" because this model used the same metric of evaluation the AraGen developers use (the 3C3H) as a reward model to improve its responses and this sparks the question. Is this something good for users or is it another type of overfitting that we don't want ?

I don't know if this is a good thing or a bad thing but what i know is that you can try it from here:
Navid-AI/Yehia-7B-preview

or Download it for your personal experiments from here:
Navid-AI/Yehia-7B-preview

Ramadan Kareem πŸŒ™
  • 1 reply
Β·
replied to their post 11 days ago
view reply

Yes, you’re totally right. But I use this abbreviation for Ukrainian, not Ukraine.

replied to their post 12 days ago
replied to their post 12 days ago
posted an update 12 days ago
view post
Post
622
Added Advanced options to RAD-TTS++ space, so you can synthesize Ukrainian voices precisely.

Space: Yehor/radtts-uk-vocos-demo
  • 5 replies
Β·