30 432 21

Fangyuan Yu PRO

Ksgk-fy

fangyuan-ksgk

AI & ML interests

AGI

Recent Activity

updated a collection 3 days ago

Representation & Optimization

upvoted a paper 3 days ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

updated a collection 4 days ago

Representation & Optimization

View all activity

Organizations

Ksgk-fy's activity

upvoted a paper 3 days ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

Paper • 2410.17897 • Published Oct 23, 2024 • 9

upvoted 3 papers 4 days ago

Flex Attention: A Programming Model for Generating Optimized Attention Kernels

Paper • 2412.05496 • Published Dec 7, 2024 • 1

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Paper • 2504.00906 • Published 5 days ago • 19

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published 6 days ago • 1

upvoted a collection 4 days ago

Representation & Optimization

Collection

Understanding about representation sheds light on optimization • 7 items • Updated 3 days ago • 1

upvoted 2 papers 4 days ago

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1

Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published 5 days ago • 1

upvoted a paper 8 days ago

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1

upvoted a paper 10 days ago

Layer by Layer: Uncovering Hidden Representations in Language Models

Paper • 2502.02013 • Published Feb 4 • 1

upvoted 3 papers 11 days ago

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners

Paper • 2503.16356 • Published 17 days ago • 15

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published 13 days ago • 112

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 92

upvoted a paper 13 days ago

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published 17 days ago • 34

upvoted a paper 22 days ago

Denoising Hamiltonian Network for Physical Reasoning

Paper • 2503.07596 • Published 27 days ago • 1

upvoted a collection about 1 month ago

Image / Video Gen

Collection

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 36 items • Updated Mar 1 • 9

upvoted 5 papers about 1 month ago

Scaling LLM Pre-training with Vocabulary Curriculum

Paper • 2502.17910 • Published Feb 25 • 1

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 73

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19 • 69

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 35

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 69