ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published 14 days ago • 21
DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking Paper • 2502.20730 • Published Feb 28 • 39
DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking Paper • 2502.20730 • Published Feb 28 • 39
Running 2.46k 2.46k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models Paper • 2502.01142 • Published Feb 3 • 24
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models Paper • 2502.01142 • Published Feb 3 • 24
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models Paper • 2303.16421 • Published Mar 29, 2023
A Drop of Ink Makes a Million Think: The Spread of False Information in Large Language Models Paper • 2305.04812 • Published May 8, 2023 • 1
Unified Structure Generation for Universal Information Extraction Paper • 2203.12277 • Published Mar 23, 2022
SoFA: Shielded On-the-fly Alignment via Priority Rule Following Paper • 2402.17358 • Published Feb 27, 2024 • 1
Self-Retrieval: Building an Information Retrieval System with One Large Language Model Paper • 2403.00801 • Published Feb 23, 2024 • 2
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect Paper • 2403.03853 • Published Mar 6, 2024 • 65