Mariusz Kurman's picture

Mariusz Kurman PRO

mkurman

·

AI & ML interests

AI Tech Lead | MD

Recent Activity

liked a dataset 10 days ago

MaziyarPanahi/Llama-Nemotron-Post-Training-Dataset-v1-ShareGPT

liked a model 11 days ago

MaziyarPanahi/DeepSeek-V3-0324-GGUF

reacted to Kseniase's post with 👀 13 days ago

8 types of RoPE As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on. Here are 8 types of RoPE that can be implemented in different cases: 1. Original RoPE -> https://huggingface.co/papers/2104.09864 Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info. 2. LongRoPE -> https://huggingface.co/papers/2402.13753 Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search. 3. LongRoPE2 -> https://huggingface.co/papers/2502.20082 Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity. 4. Multimodal RoPE (MRoPE) -> https://huggingface.co/papers/2502.13923 Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos. 5. Directional RoPE (DRoPE) -> https://huggingface.co/papers/2503.15029 Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage. 6. VideoRoPE -> https://huggingface.co/papers/2502.05173 Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing. 7. VRoPE -> https://huggingface.co/papers/2502.11664 An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus. 8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.

View all activity

Organizations

Posts 17

Post

552

I feel like it's going to take me forever

meditsolutions/medit-one-140M-9B-tokens-checkpoint

Post

846

Just released NVAMP Loss!

✔️ modification of the cross-entropy loss function designed specifically for training LLMs.
✔️ twist on the standard cross-entropy loss by emphasizing the importance of outlier prediction errors and dynamically normalizing token-level variance.
✔️ more stable and efficient training, leading to models that generalize better.

Check it out, give it a spin, and let me know what you think!

Licensed under the Apache 2.0 license and ready to use. Happy training! 🔥🤖

https://github.com/mkurman/nvamp-loss

spaces 1

Llama 3.2 SUN 2.5B Chat

You can try MedIT Solutions latest release of SUN 2.5B Llama

models 6

mkurman/llama-3.2-MEDIT-3B-o1-GRPO-LLM-Eval

Text Generation • Updated Feb 22 • 13

mkurman/Llama-3.2-MedIT-3B-R1

Updated Feb 14 • 137 • 1

mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO

Updated Feb 8 • 85 • 3

mkurman/Qwen2.5-14B-DeepSeek-R1-1M

Text Generation • Updated Jan 27 • 642 • 51

mkurman/phi4-MedIT-10B-o1

Text Generation • Updated Jan 18 • 90 • 4

mkurman/llama-3.2-MEDIT-3B-o1

Text Generation • Updated Jan 7 • 174 • 12

datasets 1

mkurman/simplescaling-s1K-R1

Viewer • Updated Feb 7 • 1k • 304