Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 110
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 56
view article Article Orchestration of Experts: The First-Principle Multi-Model System By alirezamsh • May 30 • 15
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware Paper • 2304.13705 • Published Apr 23, 2023 • 2
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24 • 56
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 251
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 9 days ago • 676
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6 • 13
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples Paper • 2404.07544 • Published Apr 11 • 18
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11 • 41
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 103
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8 • 62
view article Article Pollen-Vision: Unified interface for Zero-Shot vision models in robotics Mar 25 • 7
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 104
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Paper • 2403.00818 • Published Feb 26 • 14
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 592
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 17 days ago • 206
CroissantLLM: A Truly Bilingual French-English Language Model Paper • 2402.00786 • Published Feb 1 • 25
Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access Paper • 2401.09967 • Published Jan 18 • 1
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 106
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4 • 29
Possible Meissner effect near room temperature in copper-substituted lead apatite Paper • 2401.00999 • Published Jan 2 • 5
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 40
Gated Linear Attention Transformers with Hardware-Efficient Training Paper • 2312.06635 • Published Dec 11, 2023 • 5
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts Paper • 2211.15841 • Published Nov 29, 2022 • 7
Tulu V2 Suite Collection The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" • 19 items • Updated 10 days ago • 43
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 39