Alyona Vert

alyona0l

AI & ML interests

None yet

Recent Activity

upvoted an article 5 days ago

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

published an article 6 days ago

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

upvoted an article 21 days ago

What is Qwen-Agent framework? Inside the Qwen family

View all activity

Organizations

alyona0l's activity

upvoted an article 5 days ago

Article

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

and 1 other •

6 days ago

• 13

published an article 6 days ago

Article

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

and 1 other •

6 days ago

• 13

upvoted an article 21 days ago

Article

What is Qwen-Agent framework? Inside the Qwen family

and 1 other •

21 days ago

• 8

published an article 21 days ago

Article

What is Qwen-Agent framework? Inside the Qwen family

and 1 other •

21 days ago

• 8

upvoted an article 23 days ago

Article

🌁#92: Fight for Developers and the Year of Orchestration

•

23 days ago

• 5

upvoted an article 24 days ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

•

24 days ago

• 145

reacted to Kseniase's post with 🔥 25 days ago

Post

7777

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇

1 reply

upvoted an article 28 days ago

Article

How to Reduce Memory Use in Reasoning Models

and 1 other •

28 days ago

• 14

published an article 28 days ago

Article

How to Reduce Memory Use in Reasoning Models

and 1 other •

28 days ago

• 14

upvoted an article 29 days ago

Article

🦸🏻#13: Action! How AI Agents Execute Tasks with UI and API Tools

•

Mar 10

• 8

upvoted an article 30 days ago

Article

🌁#91: We are failing in AI literacy

and 1 other •

about 1 month ago

• 3

published an article about 1 month ago

Article

🌁#91: We are failing in AI literacy

and 1 other •

about 1 month ago

• 3

upvoted 2 articles about 1 month ago

Article

🦸🏻#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI

•

Mar 9

• 6

Article

Everything You Need to Know about Knowledge Distillation

and 1 other •

Mar 6

• 22

published an article about 1 month ago

Article

Everything You Need to Know about Knowledge Distillation

and 1 other •

Mar 6

• 22

upvoted 2 articles about 1 month ago

Article

🌁#90: Why AI’s Reasoning Tests Keep Failing Us

•

Mar 3

• 9

Article

Inside the family of Smol models

and 1 other •

Feb 27

• 9

published an article about 1 month ago

Article

Inside the family of Smol models

and 1 other •

Feb 27

• 9

upvoted an article about 1 month ago

Article

🌁#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

•

Feb 25

• 4