3 21 3

Haoze Wu

WaitHZ

https://waithz.github.io/

AI & ML interests

Modular DL, Complex Reasoning

Organizations

upvoted a paper 3 months ago

Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

Paper • 2602.02343 • Published Feb 2 • 13

upvoted 3 papers 5 months ago

upvoted 3 papers 6 months ago

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Paper • 2509.25123 • Published Sep 29, 2025 • 22

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29, 2025 • 46

LightMem: Lightweight and Efficient Memory-Augmented Generation

Paper • 2510.18866 • Published Oct 21, 2025 • 116

upvoted a collection 7 months ago

DeepSeek-V3.2

Collection

4 items • Updated Dec 1, 2025 • 542

upvoted 2 papers 8 months ago

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper • 2509.06501 • Published Sep 8, 2025 • 82

Model-Task Alignment Drives Distinct RL Outcomes

Paper • 2508.21188 • Published Aug 28, 2025 • 8

upvoted a paper about 1 year ago

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Paper • 2503.04598 • Published Mar 6, 2025 • 22

upvoted an article about 1 year ago

Article

Open-R1: Update #1

Feb 2, 2025

•

305

upvoted a paper about 1 year ago

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published Feb 11, 2025 • 10

upvoted 2 articles about 1 year ago

Article

How to generate text: using different decoding methods for language generation with Transformers

Mar 1, 2020

•

294

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

475

upvoted 4 papers over 1 year ago

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23, 2025 • 48

Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22, 2025 • 44

Benchmarking Chinese Knowledge Rectification in Large Language Models

Paper • 2409.05806 • Published Sep 9, 2024 • 15

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Paper • 2409.05152 • Published Sep 8, 2024 • 32

upvoted a paper almost 2 years ago

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

Paper • 2406.12375 • Published Jun 18, 2024 • 1

Haoze Wu

AI & ML interests

Organizations

WaitHZ's activity

Open-R1: Update #1

How to generate text: using different decoding methods for language generation with Transformers

You could have designed state of the art positional encoding