1 8 11

Weigao Sun

weigao266

AI & ML interests

Algo & MLSys

Recent Activity

upvoted a paper 6 days ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

commented on a paper 6 days ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

authored a paper 13 days ago

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

View all activity

Organizations

weigao266's activity

upvoted a paper 6 days ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published 7 days ago • 31

commented a paper 6 days ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published 7 days ago • 31 •

authored 6 papers 13 days ago

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

Paper • 2401.16265 • Published Jan 29, 2024 • 1

Linear Attention Sequence Parallelism

Paper • 2404.02882 • Published Apr 3, 2024 • 3

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Paper • 2405.17381 • Published May 27, 2024

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Paper • 2411.15708 • Published Nov 24, 2024

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 273

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Paper • 2502.07563 • Published 15 days ago • 23

upvoted a paper 13 days ago

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Paper • 2502.07563 • Published 15 days ago • 23

commented a paper 14 days ago

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Paper • 2502.07563 • Published 15 days ago • 23 •

upvoted a paper about 1 month ago

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 56

liked a dataset 2 months ago

HuggingFaceFW/fineweb-edu

Viewer • Updated 26 days ago • 3.3B • 511k • 636

liked a dataset 8 months ago

cerebras/SlimPajama-627B

Preview • Updated Jul 7, 2023 • 98k • 454

upvoted a collection 8 months ago

Qwen2

Collection

Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated Nov 28, 2024 • 357

liked a model 8 months ago

Qwen/Qwen2-0.5B

Text Generation • Updated Oct 22, 2024 • 614k • • 134

authored a paper 8 months ago

Scaling Laws for Linear Complexity Language Models

Paper • 2406.16690 • Published Jun 24, 2024 • 23

upvoted a collection 8 months ago

SSMs

Collection

A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated Jan 17 • 27

authored a paper 10 months ago

HGRN2: Gated Linear RNNs with State Expansion

Paper • 2404.07904 • Published Apr 11, 2024 • 19

upvoted 2 papers 11 months ago

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

Paper • 2401.16265 • Published Jan 29, 2024 • 1

Linear Attention Sequence Parallelism

Paper • 2404.02882 • Published Apr 3, 2024 • 3