7 18

slu

sslu

AI & ML interests

None yet

Recent Activity

upvoted a paper about 17 hours ago

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

upvoted a paper 4 days ago

Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation

upvoted a paper 24 days ago

Model Merging Scaling Laws in Large Language Models

View all activity

Organizations

upvoted a paper about 17 hours ago

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

Paper • 2605.29489 • Published 8 days ago • 4

upvoted a paper 4 days ago

Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation

Paper • 2605.26844 • Published 10 days ago • 25

upvoted 2 papers 24 days ago

Model Merging Scaling Laws in Large Language Models

Paper • 2509.24244 • Published 25 days ago • 44

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Paper • 2605.09608 • Published 26 days ago • 52

liked a Space about 1 month ago

Megatron Memory Estimator

👁

Estimate GPU memory usage for Megatron models

liked 2 models about 1 month ago

deepseek-ai/DeepSeek-V4-Flash

Text Generation • 158B • Updated 30 days ago • 3.5M • • 1.4k

deepseek-ai/DeepSeek-V4-Pro

Text Generation • 862B • Updated 30 days ago • 5.69M • • 4.63k

upvoted a collection about 1 month ago

DeepSeek-V4

Collection

4 items • Updated Apr 24 • 672

upvoted an article 6 months ago

Article

Large-scale Near-deduplication Behind BigCode

chenghao

•

May 16, 2023

• 37

liked a dataset 6 months ago

nick007x/arxiv-papers

Viewer • Updated Apr 1 • 2.55M • 891k • 193

authored a paper 6 months ago

InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

Paper • 2508.05496 • Published Aug 7, 2025 • 9

liked a Space 7 months ago

The Ultra-Scale Playbook

🌌

3.87k

The ultimate guide to training LLM on large GPU Clusters

liked 8 models 10 months ago

slu