siyeng feng's picture

615 220

siyeng feng

siyengfeng

·

AI & ML interests

None yet

Recent Activity

liked a model about 23 hours ago

meta-llama/Llama-4-Maverick-17B-128E-Instruct

liked a model about 23 hours ago

all-hands/openhands-lm-32b-v0.1

liked a model about 23 hours ago

meta-llama/Llama-4-Scout-17B-16E-Instruct

View all activity

Organizations

None yet

siyengfeng's activity

liked 5 models about 23 hours ago

meta-llama/Llama-4-Maverick-17B-128E-Instruct

Image-Text-to-Text • Updated 6 days ago • 31.4k • • 290

all-hands/openhands-lm-32b-v0.1

Text Generation • Updated 12 days ago • 99.2k • 345

meta-llama/Llama-4-Scout-17B-16E-Instruct

Image-Text-to-Text • Updated 6 days ago • 657k • • 777

nvidia/Llama-3_1-Nemotron-Ultra-253B-v1

Text Generation • Updated 5 days ago • 11.3k • • 236

THUDM/GLM-Z1-32B-0414

Text Generation • Updated 1 day ago • 268 • 79

upvoted 5 papers about 23 hours ago

Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models

Paper • 2504.05262 • Published 8 days ago • 7

Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

Paper • 2504.07866 • Published 5 days ago • 7

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

Paper • 2504.08716 • Published 4 days ago • 7

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published 4 days ago • 20

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published 4 days ago • 99

upvoted 10 papers 1 day ago

Towards Visual Text Grounding of Multimodal Large Language Model

Paper • 2504.04974 • Published 8 days ago • 11

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Paper • 2504.07934 • Published 5 days ago • 14

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published 5 days ago • 20

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Paper • 2504.07964 • Published 5 days ago • 58

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published 14 days ago • 72

Kimi-VL Technical Report

Paper • 2504.07491 • Published 5 days ago • 108

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published 6 days ago • 9

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

Paper • 2504.05410 • Published 8 days ago • 2

Self-Steering Language Models

Paper • 2504.07081 • Published 6 days ago • 15

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Paper • 2504.07086 • Published 6 days ago • 17