3 25 4

Samuel Arcadinho

SSamDav

SSamDav

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

Transformers without Normalization

upvoted a paper 7 days ago

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

upvoted a paper 9 days ago

Forgetting Transformer: Softmax Attention with a Forget Gate

View all activity

Organizations

SSamDav's activity

upvoted a paper 5 days ago

Transformers without Normalization

Paper • 2503.10622 • Published 6 days ago • 126

upvoted a paper 7 days ago

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published 9 days ago • 36

upvoted 2 papers 9 days ago

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published 15 days ago • 27

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published 12 days ago • 72

upvoted 2 papers 23 days ago

SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published 27 days ago • 95

MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published 29 days ago • 14

liked a Space 27 days ago

2.3k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted 2 collections 29 days ago

Dria-Agent-a

Collection

powerful agentic models built for pythonic function calling • 4 items • Updated Feb 14 • 4

Tiny-Agent-a

Collection

fast and powerful agentic models designed to run on edge devices. • 6 items • Updated Feb 12 • 7

commented 2 papers about 1 month ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 124 •

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 124 •

upvoted 2 papers about 1 month ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 124

Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 21

commented a paper about 1 month ago

Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 21 •

upvoted 2 papers about 1 month ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 112

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

Paper • 2411.04983 • Published Nov 7, 2024 • 12

upvoted a paper about 2 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 109

upvoted 2 papers 3 months ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 135

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 354

liked a dataset 3 months ago

HuggingFaceFW/fineweb-2

Viewer • Updated Jan 8 • 12.5B • 72.2k • 448