Hongyu Wang's picture

Hongyu Wang

hongyuw

·

https://ustcwhy.github.io/

AI & ML interests

Language Model Pre-training

Recent Activity

liked a dataset 24 days ago

TIGER-Lab/VisualWebInstruct

upvoted a paper about 2 months ago

Distillation Scaling Laws

liked a model 3 months ago

MiniMaxAI/MiniMax-Text-01

View all activity

Organizations

hongyuw's activity

upvoted a paper about 2 months ago

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47

upvoted 3 papers 4 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 45

MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 28

upvoted 3 papers 5 months ago

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation

Paper • 2411.08380 • Published Nov 13, 2024 • 27

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 68

Self-Boosting Large Language Models with Synthetic Preference Data

Paper • 2410.06961 • Published Oct 9, 2024 • 16

upvoted 3 papers 6 months ago

VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI

Paper • 2410.11623 • Published Oct 15, 2024 • 49

HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks

Paper • 2410.12381 • Published Oct 16, 2024 • 45

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 178

upvoted a paper 9 months ago

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Paper • 2407.10969 • Published Jul 15, 2024 • 23

upvoted a paper about 1 year ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 613

upvoted 5 papers over 1 year ago

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 47

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Paper • 2312.14125 • Published Dec 21, 2023 • 47

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 98

upvoted a paper almost 2 years ago

Kosmos-2: Grounding Multimodal Large Language Models to the World

Paper • 2306.14824 • Published Jun 26, 2023 • 34