Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.05795

about 9 hours ago

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published May 2, 2024 • 27
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 38
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 52
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 106
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17 • 43
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 29
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Paper • 2501.10132 • Published Jan 17 • 19

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 13 days ago • 30

LLM Pretraining with Continuous Concepts

Paper • 2502.08524 • Published 10 days ago • 25
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

Paper • 2502.06635 • Published 12 days ago • 4
The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 13 days ago • 30
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 17 days ago • 188

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 13 days ago • 30

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 13 days ago • 30

PILAF: Optimal Human Preference Sampling for Reward Modeling

Paper • 2502.04270 • Published 16 days ago • 11
The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 13 days ago • 30

about 19 hours ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 42
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 56

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 27
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 13
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 48
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 30

LM Architectures

about 17 hours ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 45
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 33
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 140

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs