Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 5 days ago • 43
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 12 days ago • 54
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 12 days ago • 283
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published 14 days ago • 63
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 20 days ago • 271
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published about 1 month ago • 90
xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning Paper • 2401.07037 • Published Jan 13, 2024 • 2
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! Paper • 2402.12343 • Published Feb 19, 2024
m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt Paper • 2403.17556 • Published Mar 26, 2024 • 1
The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis Paper • 2404.01204 • Published Apr 1, 2024
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model Paper • 2404.04167 • Published Apr 5, 2024 • 13
MuPT: A Generative Symbolic Music Pretrained Transformer Paper • 2404.06393 • Published Apr 9, 2024 • 16
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models Paper • 2406.01359 • Published Jun 3, 2024
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models Paper • 2406.01375 • Published Jun 3, 2024
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models Paper • 2406.05862 • Published Jun 9, 2024 • 4
UniCoder: Scaling Code Large Language Model via Universal Code Paper • 2406.16441 • Published Jun 24, 2024 • 2
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models Paper • 2406.14550 • Published Jun 20, 2024 • 4
MMRA: A Benchmark for Multi-granularity Multi-image Relational Association Paper • 2407.17379 • Published Jul 24, 2024 • 2
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm Paper • 2408.08072 • Published Aug 15, 2024 • 35