NeuLab @ LTI/CMU

university

https://www.cs.cmu.edu/~neulab/

neulab

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yueqis authored a paper about 23 hours ago

Beyond Browsing: API-Based Web Agents

yueqis authored a paper about 23 hours ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

yueqis authored a paper about 23 hours ago

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

View all activity

neulab's activity

yueqis

authored 4 papers about 23 hours ago

Beyond Browsing: API-Based Web Agents

Paper • 2410.16464 • Published Oct 21, 2024

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10 • 97

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Paper • 2504.07079 • Published 9 days ago • 11

VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

Paper • 2504.10342 • Published 4 days ago • 9

yueqis

updated a dataset 1 day ago

neulab/VisualPuzzles

Viewer • Updated 1 day ago • 1.17k • 63 • 1

yueqis

published a dataset 4 days ago

neulab/VisualPuzzles

Viewer • Updated 1 day ago • 1.17k • 63 • 1

seungone

authored 2 papers 9 days ago

M-Prometheus: A Suite of Open Multilingual LLM Judges

Paper • 2504.04953 • Published 11 days ago

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Paper • 2503.19877 • Published 24 days ago

yuexiang96

authored a paper 15 days ago

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

Paper • 2504.00824 • Published 17 days ago • 38

ProKil

authored 5 papers about 2 months ago

Mind the Gap! Static and Interactive Evaluations of Large Audio Models

Paper • 2502.15919 • Published Feb 21 • 4

EgoNormia: Benchmarking Physical Social Norm Understanding

Paper • 2502.20490 • Published Feb 27 • 5

Grounded Persuasive Language Generation for Automated Marketing

Paper • 2502.16810 • Published Feb 24 • 12

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Paper • 2409.16427 • Published Sep 24, 2024 • 1

What Are Tools Anyway? A Survey from the Language Model Perspective

Paper • 2403.15452 • Published Mar 18, 2024

yuexiang96

authored a paper 2 months ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5 • 59

gneubig

authored a paper 2 months ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5 • 59

yuexiang96

authored 2 papers 3 months ago

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Paper • 2501.17703 • Published Jan 29 • 58

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 26

zorazrw

authored 2 papers 3 months ago

CodeRAG-Bench: Can Retrieval Augment Code Generation?

Paper • 2406.14497 • Published Jun 20, 2024 • 2

Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11, 2024 • 32