Yilun's picture

5 14 5

Yilun PRO

yilunzhao

·

AI & ML interests

None yet

Recent Activity

updated a dataset 10 days ago

yale-nlp/LitSearch-NLP-Class

published a dataset 10 days ago

yale-nlp/LitSearch-NLP-Class

liked a model 11 days ago

efficientscaling/Z1-7B

View all activity

Organizations

yilunzhao's activity

upvoted a paper 12 days ago

Z1: Efficient Test-time Scaling with Code

Paper • 2504.00810 • Published 12 days ago • 25

upvoted a paper 14 days ago

PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving

Paper • 2503.21821 • Published 19 days ago • 17

upvoted a paper 18 days ago

MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search

Paper • 2503.20757 • Published 18 days ago • 9

upvoted a paper 23 days ago

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published 24 days ago • 84

upvoted 2 papers about 1 month ago

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

Paper • 2503.07459 • Published Mar 10 • 15

IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval

Paper • 2503.04644 • Published Mar 6 • 20

upvoted 2 papers about 2 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 180

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published Feb 13 • 193

upvoted 5 papers 3 months ago

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 86

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 113

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 380

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 91

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Paper • 2412.21199 • Published Dec 30, 2024 • 14

upvoted a paper 5 months ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30, 2024 • 20