rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 4 days ago β’ 185
LLM4SR: A Survey on Large Language Models for Scientific Research Paper β’ 2501.04306 β’ Published 4 days ago β’ 29
Agent Laboratory: Using LLM Agents as Research Assistants Paper β’ 2501.04227 β’ Published 4 days ago β’ 65
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper β’ 2501.03895 β’ Published 5 days ago β’ 42
Personalized Graph-Based Retrieval for Large Language Models Paper β’ 2501.02157 β’ Published 8 days ago β’ 26
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System Paper β’ 2412.20005 β’ Published 15 days ago β’ 17
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated 21 days ago β’ 208
view article Article β΄οΈ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang β’ 9 days ago β’ 11
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper β’ 2501.01257 β’ Published 10 days ago β’ 45
view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 β’ Nov 21, 2024 β’ 35
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 10 days ago β’ 37
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 40
Open LLM Leaderboard best models β€οΈβπ₯ Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: β’ 60 items β’ Updated 31 minutes ago β’ 504
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group β’ 19 items β’ Updated 22 days ago β’ 19
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper β’ 2412.13018 β’ Published 26 days ago β’ 41
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper β’ 2306.05685 β’ Published Jun 9, 2023 β’ 33
π± Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs β’ 9 items β’ Updated Dec 3, 2024 β’ 22