-
Massive Activations in Large Language Models
Paper • 2402.17762 • Published • 1 -
What Matters in Transformers? Not All Attention is Needed
Paper • 2406.15786 • Published • 32 -
The Super Weight in Large Language Models
Paper • 2411.07191 • Published • 5 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 22
Yi Cui
yicui
·
AI & ML interests
None yet
Recent Activity
new activity
15 days ago
onekq-ai/OneSQL-v0.1-Qwen-32B-AWQ:Update README.md
updated
a collection
3 months ago
RL
updated
a collection
5 months ago
Mechanistic
Organizations
None yet
Collections
10
-
glaiveai/glaive-coder-7b
Text Generation • Updated • 1.04k • 54 -
glaiveai/glaive-code-assistant-v3
Viewer • Updated • 950k • 225 • 48 -
ibm-granite/granite-3b-code-base-128k
Text Generation • Updated • 177 • • 6 -
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Paper • 2405.04324 • Published • 22