30 432 21

Fangyuan Yu PRO

Ksgk-fy

fangyuan-ksgk

AI & ML interests

AGI

Recent Activity

updated a collection 4 days ago

Representation & Optimization

upvoted a paper 4 days ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

updated a collection 4 days ago

Representation & Optimization

View all activity

Organizations

Ksgk-fy's activity

commented a paper 14 days ago

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published 17 days ago • 34 •

commented a paper about 1 month ago

Scaling LLM Pre-training with Vocabulary Curriculum

Paper • 2502.17910 • Published Feb 25 • 1 •

commented 4 papers 5 months ago

commented 9 papers 6 months ago

Autoregressive Large Language Models are Computationally Universal

Paper • 2410.03170 • Published Oct 4, 2024 • 1 •

Agent-as-a-Judge: Evaluate Agents with Agents

Paper • 2410.10934 • Published Oct 14, 2024 • 22 •

Emergent properties with repeated examples

Paper • 2410.07041 • Published Oct 9, 2024 • 8 •

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 27 •

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24 •

Intelligence at the Edge of Chaos

Paper • 2410.02536 • Published Oct 3, 2024 • 6 •

Can Models Learn Skill Composition from Examples?

Paper • 2409.19808 • Published Sep 29, 2024 • 10 •

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Paper • 2409.12192 • Published Sep 18, 2024 • 5 •

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Paper • 2409.12192 • Published Sep 18, 2024 • 5 •

commented 2 papers 7 months ago

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Paper • 2409.12903 • Published Sep 19, 2024 • 22 •

Iterative Graph Alignment

Paper • 2408.16667 • Published Aug 29, 2024 • 2 •

New activity in meta-llama/Llama-3.1-8B-Instruct 9 months ago

Tokenizer 'apply_chat_template' issue

#42 opened 9 months ago by

Ksgk-fy

what is the right tokenizer should I use for llama 3.1 8B?

#19 opened 9 months ago by

calebl

New activity in qresearch/llama-3.1-8B-vision-378 9 months ago

Great model!

#1 opened 9 months ago by

Ksgk-fy