-
Lost in the Middle: How Language Models Use Long Contexts
Paper ā¢ 2307.03172 ā¢ Published ā¢ 35 -
Efficient Estimation of Word Representations in Vector Space
Paper ā¢ 1301.3781 ā¢ Published ā¢ 6 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 14 -
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 41
Collections
Discover the best community collections!
Collections including paper arxiv:2201.11903
-
TinyLlama: An Open-Source Small Language Model
Paper ā¢ 2401.02385 ā¢ Published ā¢ 89 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ā¢ 2401.13601 ā¢ Published ā¢ 44 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper ā¢ 2401.15024 ā¢ Published ā¢ 67 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper ā¢ 2401.16380 ā¢ Published ā¢ 46
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 41 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 14 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ā¢ 1907.11692 ā¢ Published ā¢ 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ā¢ 1910.01108 ā¢ Published ā¢ 14
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 41 -
Language Models are Few-Shot Learners
Paper ā¢ 2005.14165 ā¢ Published ā¢ 11 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ā¢ 2201.11903 ā¢ Published ā¢ 9 -
Orca 2: Teaching Small Language Models How to Reason
Paper ā¢ 2311.11045 ā¢ Published ā¢ 70
-
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Paper ā¢ 2310.01352 ā¢ Published ā¢ 7 -
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Paper ā¢ 2203.11171 ā¢ Published ā¢ 1 -
MemGPT: Towards LLMs as Operating Systems
Paper ā¢ 2310.08560 ā¢ Published ā¢ 6 -
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Paper ā¢ 2310.06117 ā¢ Published ā¢ 3
-
Contrastive Chain-of-Thought Prompting
Paper ā¢ 2311.09277 ā¢ Published ā¢ 34 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ā¢ 2201.11903 ā¢ Published ā¢ 9 -
Orca 2: Teaching Small Language Models How to Reason
Paper ā¢ 2311.11045 ā¢ Published ā¢ 70 -
System 2 Attention (is something you might need too)
Paper ā¢ 2311.11829 ā¢ Published ā¢ 39
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper ā¢ 2401.02038 ā¢ Published ā¢ 61 -
Learning To Teach Large Language Models Logical Reasoning
Paper ā¢ 2310.09158 ā¢ Published ā¢ 1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper ā¢ 2311.00176 ā¢ Published ā¢ 8 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper ā¢ 2308.09583 ā¢ Published ā¢ 7
-
Retentive Network: A Successor to Transformer for Large Language Models
Paper ā¢ 2307.08621 ā¢ Published ā¢ 170 -
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Paper ā¢ 2303.12712 ā¢ Published ā¢ 2 -
GPT-4 Technical Report
Paper ā¢ 2303.08774 ā¢ Published ā¢ 5 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ā¢ 2201.11903 ā¢ Published ā¢ 9
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper ā¢ 2309.03883 ā¢ Published ā¢ 33 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper ā¢ 2106.09685 ā¢ Published ā¢ 29 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper ā¢ 2309.07870 ā¢ Published ā¢ 39 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper ā¢ 2309.00267 ā¢ Published ā¢ 47