6 2 2

Shantanu Agarwal

shantanuagarwal

AI & ML interests

None yet

Recent Activity

commented on an article about 19 hours ago

Efficient LLM Pretraining: Packed Sequences and Masked Attention

upvoted an article 2 days ago

Efficient LLM Pretraining: Packed Sequences and Masked Attention

liked a Space about 2 months ago

nanotron/ultrascale-playbook

View all activity

Organizations

shantanuagarwal's activity

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention about 19 hours ago

Hi @sirluk , thanks for the great post. Do you know if the above masking technique works for some attention implementations and would be incompatible with some other?

For example, would the above masking work with SDPA/flash_attention_2 and eager (each of these implementations are dealt a bit differently in https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L666 for example)?

upvoted an article 2 days ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

•

Oct 7, 2024

• 32

liked a Space about 2 months ago

2.5k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

liked a model 7 months ago

Qwen/Qwen2.5-14B

Text Generation • Updated Sep 20, 2024 • 189k • 111

New activity in Qwen/Qwen2.5-14B 7 months ago

lora support

#3 opened 7 months ago by

shantanuagarwal

New activity in mistralai/Mistral-Small-Instruct-2409 7 months ago

Base model please

#6 opened 7 months ago by

rombodawg

New activity in nvidia/NV-Embed-v1 10 months ago

Why do we need to hardcode self._attn_implementation = "eager"

#35 opened 10 months ago by

shantanuagarwal

MLP intermediate dimension

#3 opened 11 months ago by

shantanuagarwal

upvoted a collection about 1 year ago

🤖 Agents

Collection

21 items • Updated Dec 31, 2024 • 151