Interesting Papers - a marcelweiss Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

marcelweiss 's Collections

Interesting Papers

Interesting Papers

updated 1 day ago

These papers are interesting (to me)

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Paper • 2410.02740 • Published Oct 3, 2024 • 52
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2, 2024 • 30
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 106
EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 26
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

Paper • 2409.14988 • Published Sep 23, 2024 • 22
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19, 2024 • 37
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published Sep 6, 2024 • 44
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Paper • 2410.03017 • Published Oct 3, 2024 • 27
Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 145
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 48
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References

Paper • 2410.05193 • Published Oct 7, 2024 • 13
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 114
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7, 2024 • 70
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 49
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Paper • 2411.04999 • Published Nov 7, 2024 • 17
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Paper • 2411.03590 • Published Nov 6, 2024 • 10
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 65
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5, 2024 • 66
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge

Paper • 2411.02657 • Published Nov 4, 2024 • 5
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper • 2410.24024 • Published Oct 31, 2024 • 48
How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published Nov 4, 2024 • 33
Survey of Cultural Awareness in Language Models: Text and Beyond

Paper • 2411.00860 • Published Oct 30, 2024 • 23
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 25
DynaSaur: Large Language Agents Beyond Predefined Actions

Paper • 2411.01747 • Published Nov 4, 2024 • 25
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

Paper • 2411.00492 • Published Nov 1, 2024 • 6
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 46
DELTA: Dense Efficient Long-range 3D Tracking for any video

Paper • 2410.24211 • Published Oct 31, 2024 • 8
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Paper • 2410.21845 • Published Oct 29, 2024 • 13
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published Oct 29, 2024 • 10
A Survey of Small Language Models

Paper • 2410.20011 • Published Oct 25, 2024 • 40
Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published Oct 14, 2024 • 54
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published Oct 24, 2024 • 32
LongReward: Improving Long-context Large Language Models with AI Feedback

Paper • 2410.21252 • Published Oct 28, 2024 • 17
Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Paper • 2410.19008 • Published Oct 21, 2024 • 23
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24, 2024 • 40
WorldSimBench: Towards Video Generation Models as World Simulators

Paper • 2410.18072 • Published Oct 23, 2024 • 18
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes

Paper • 2410.18084 • Published Oct 23, 2024 • 13
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

Paper • 2410.17249 • Published Oct 22, 2024 • 41
AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 59
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

Paper • 2410.16271 • Published Oct 21, 2024 • 81
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17, 2024 • 41
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published Oct 17, 2024 • 75
MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Paper • 2410.13757 • Published Oct 17, 2024 • 32
Exploring Model Kinship for Merging Large Language Models

Paper • 2410.12613 • Published Oct 16, 2024 • 20
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published Oct 14, 2024 • 49
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

Paper • 2410.10626 • Published Oct 14, 2024 • 38
RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 49
Soft Robotic Dynamic In-Hand Pen Spinning

Paper • 2411.12734 • Published Nov 19, 2024 • 9
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Paper • 2411.10323 • Published Nov 15, 2024 • 32
Sharingan: Extract User Action Sequence from Desktop Recordings

Paper • 2411.08768 • Published Nov 13, 2024 • 10
Hermes: A Large Language Model Framework on the Journey to Autonomous Networks

Paper • 2411.06490 • Published Nov 10, 2024 • 6
Large Language Models Can Self-Improve in Long-context Reasoning

Paper • 2411.08147 • Published Nov 12, 2024 • 63
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection

Paper • 2411.08868 • Published Nov 13, 2024 • 12
GRAPE: Generalizing Robot Policy via Preference Alignment

Paper • 2411.19309 • Published Nov 28, 2024 • 44
On Domain-Specific Post-Training for Multimodal Large Language Models

Paper • 2411.19930 • Published Nov 29, 2024 • 25
Reverse Thinking Makes LLMs Stronger Reasoners

Paper • 2411.19865 • Published Nov 29, 2024 • 20
Large Language Model-Brained GUI Agents: A Survey

Paper • 2411.18279 • Published Nov 27, 2024 • 29
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

Paper • 2411.15139 • Published Nov 22, 2024 • 15
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 79
Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 49
MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 24
Patience Is The Key to Large Language Model Reasoning

Paper • 2411.13082 • Published Nov 20, 2024 • 7
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Paper • 2412.07760 • Published Dec 10, 2024 • 50
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Paper • 2412.09619 • Published Dec 12, 2024 • 23
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Paper • 2501.02976 • Published 23 days ago • 52
LTX-Video: Realtime Video Latent Diffusion

Paper • 2501.00103 • Published 29 days ago • 41
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 28 days ago • 41
On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Paper • 2412.20070 • Published Dec 28, 2024 • 45
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published 29 days ago • 37
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 97
VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published 19 days ago • 66
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 18 days ago • 59
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published 20 days ago • 89
Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published 21 days ago • 82
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Paper • 2501.09012 • Published 13 days ago • 10
VideoAuteur: Towards Long Narrative Video Generation

Paper • 2501.06173 • Published 18 days ago • 31
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning

Paper • 2501.06458 • Published 18 days ago • 29

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs