1 2

Shang Yang

Shangy

ys-2020

AI & ML interests

None yet

Recent Activity

authored a paper about 1 month ago

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

authored a paper about 1 month ago

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

authored a paper about 1 month ago

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

View all activity

Organizations

Shangy's activity

authored 9 papers about 1 month ago

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Paper • 2410.10819 • Published Oct 14, 2024 • 7

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 59

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Paper • 2502.14866 • Published Feb 20 • 13

upvoted a paper about 1 month ago

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Paper • 2502.14866 • Published Feb 20 • 13

updated a dataset about 1 month ago

mit-han-lab/QServe-benchmarks

Updated Feb 15 • 124

updated a model about 1 month ago

mit-han-lab/Llama-3-8B-Instruct-Gradient-1048k-w8a8-per-channel-kv8-per-tensor

Updated Feb 14 • 26

published a model about 1 month ago

mit-han-lab/Llama-3-8B-Instruct-Gradient-1048k-w8a8-per-channel-kv8-per-tensor

Updated Feb 14 • 26

updated a model 2 months ago

Shangy/trans

Updated Jan 24

published a model 2 months ago

Shangy/trans

Updated Jan 24

upvoted a paper 4 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 59

updated a model 11 months ago

mit-han-lab/Llama-2-13B-QServe

Text Generation • Updated May 4, 2024 • 21 • 1

updated a Space about 1 year ago

VILA Demo

🌖