LLM

ashishtanwer 's Collections

DataCrawling

Agents

RAG

Transformer

DataLabelling

DataCleaning

LLM

Dataset

Evals

Training

ClassicalML

Diffusion

InfraML

updated Dec 15, 2024

Upvote

Running

2.61k

2.61k

Anycoder

🏢

Generate modern HTML designs from existing code
Running

274

274

Qwen2.5 Coder Artifacts

🐢

Generate code snippets based on user input
Running

923

923

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

13.5k

13.5k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
meta-llama/Llama-3.3-70B-Instruct

Text Generation • 71B • Updated Dec 21, 2024 • 513k • • 2.48k
The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink

Paper • 2204.05149 • Published Apr 11, 2022 • 10
meta-llama/Llama-3.2-11B-Vision-Instruct

Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 842k • • 1.51k
meta-llama/Llama-3.1-8B

Text Generation • 8B • Updated Oct 16, 2024 • 1.1M • • 1.75k
google/paligemma-3b-mix-224

Image-Text-to-Text • 3B • Updated Jul 19, 2024 • 194k • 84
google/paligemma-3b-pt-448

Image-Text-to-Text • 3B • Updated Jul 19, 2024 • 3.33k • 29
google/paligemma-3b-pt-224

Image-Text-to-Text • 3B • Updated Sep 21, 2024 • 30.6k • 348
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Paper • 2410.13085 • Published Oct 16, 2024 • 24
Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26, 2024 • 47
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 244
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 257
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 112
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 112
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 160
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 146
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 145
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 21
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 17
PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 3
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 33
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 17
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 7
PaLM 2 Technical Report

Paper • 2305.10403 • Published May 17, 2023 • 7
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 88
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 128
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117
meta-llama/Llama-Guard-3-8B

Text Generation • 8B • Updated Oct 11, 2024 • 362k • • 222
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 22
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 66
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 66
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 148
allenai/Molmo-7B-D-0924

Image-Text-to-Text • 8B • Updated Apr 4 • 21.1k • 543
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 134
OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • 78B • Updated Mar 25 • 538 • 192
Running

497

497

InternVL

⚡

Interact with a multimodal chatbot that analyzes images and text
Qwen/Qwen2.5-0.5B-Instruct

Text Generation • 0.5B • Updated Sep 25, 2024 • 1.65M • 360
Running

686

686

Qwen2.5

🚀

Chat with Qwen, an AI assistant
Qwen/Qwen2.5-0.5B

Text Generation • 0.5B • Updated Sep 25, 2024 • 1.08M • 294
google/gemma-1.1-7b-it

Text Generation • 9B • Updated Jun 27, 2024 • 15k • 274
meta-llama/Meta-Llama-3-8B-Instruct

Text Generation • 8B • Updated Jun 18 • 987k • • 4.15k
meta-llama/Meta-Llama-3-8B

Text Generation • 8B • Updated Sep 27, 2024 • 1.37M • • 6.29k
meta-llama/Meta-Llama-3-70B-Instruct

Text Generation • 71B • Updated Jun 18 • 261k • • 1.49k
meta-llama/Meta-Llama-3-70B

Text Generation • 71B • Updated Sep 27, 2024 • 15.7k • • 864
meta-llama/Meta-Llama-Guard-2-8B

Text Generation • 8B • Updated May 13, 2024 • 15.2k • 298
meta-llama/Llama-3.2-11B-Vision

Image-Text-to-Text • 11B • Updated Sep 27, 2024 • 30.6k • 544
meta-llama/Llama-3.2-1B-Instruct

Text Generation • 1B • Updated Oct 24, 2024 • 5.14M • • 1.05k
Running

1.66k

1.66k

Qwen2.5 Coder Artifacts

🐢

Generate code for applications
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 141
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 110
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 79
meta-llama/Llama-3.1-70B

Text Generation • 71B • Updated Sep 25, 2024 • 190k • 379
meta-llama/Llama-3.1-8B-Instruct

Text Generation • 8B • Updated Sep 25, 2024 • 12.9M • • 4.54k
microsoft/Phi-3.5-vision-instruct

Image-Text-to-Text • 4B • Updated Sep 26, 2024 • 485k • 703
microsoft/Phi-3.5-MoE-instruct

Text Generation • 42B • Updated Mar 7 • 221k • 561
meta-llama/Llama-3.2-3B

Text Generation • 3B • Updated Oct 24, 2024 • 513k • 624
meta-llama/Llama-3.2-1B

Text Generation • 1B • Updated Oct 24, 2024 • 3.27M • 2.05k
microsoft/Phi-3.5-mini-instruct

Text Generation • 4B • Updated Mar 2 • 213k • 904
Qwen/Qwen2-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Feb 6 • 465k • • 1.22k
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 9
liuhaotian/llava-v1.5-7b

Image-Text-to-Text • Updated May 8, 2024 • 1.71M • 490
Qwen/Qwen2-VL-2B-Instruct

Image-Text-to-Text • 2B • Updated Jan 12 • 2.83M • 443
openbmb/MiniCPM-Llama3-V-2_5

Image-Text-to-Text • 9B • Updated Jan 15 • 74.3k • 1.4k
Running

557

557

Vision Arena (Testing VLMs side-by-side)

🖼

Display image analysis results
microsoft/Florence-2-large

Image-Text-to-Text • 0.8B • Updated 25 days ago • 799k • 1.65k
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 94
nvidia/NVLM-D-72B

Image-Text-to-Text • 79B • Updated Jan 14 • 65.8k • 773
NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 75
rhymes-ai/Aria

Image-Text-to-Text • 25B • Updated Apr 23 • 53.2k • 632
mistralai/Pixtral-12B-2409

Updated Jul 28 • 2.83k • 659
HuggingFaceM4/idefics2-8b

Image-Text-to-Text • 8B • Updated Oct 14, 2024 • 10.7k • 613
liuhaotian/llava-v1.5-13b

Image-Text-to-Text • Updated May 9, 2024 • 46.3k • 511
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 122
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 104
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 159
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 76
ShareGPT4Video/ShareGPT4Video

Viewer • Updated Mar 7 • 40.2k • 2.65k • 200
An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 60
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 52
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 61
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Paper • 2406.19741 • Published Jun 28, 2024 • 63
Vision language models are blind

Paper • 2407.06581 • Published Jul 9, 2024 • 83
PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 73
LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61
Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published Aug 22, 2024 • 95
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 151
Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11, 2024 • 88
Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 99
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 127
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 89
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 118
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 119
OpenGVLab/InternVL2_5-1B

Image-Text-to-Text • 0.9B • Updated Mar 25 • 15.8k • 60
Qwen/Qwen2-Audio-7B-Instruct

Audio-Text-to-Text • 8B • Updated Jan 12 • 310k • 476
Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15, 2024 • 61
Running on Zero

256

256

Qwen2-VL-7B

🔥

Generate text from an image and question
llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • 8B • Updated May 1 • 781k • • 278
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 38
openbmb/MiniCPM-V-2

Visual Question Answering • 3B • Updated Jan 15 • 3.96k • 477
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18, 2024 • 17
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Paper • 2308.12038 • Published Aug 23, 2023 • 2
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 86
mistralai/Mixtral-8x7B-Instruct-v0.1

47B • Updated Jul 24 • 323k • 4.54k
mistralai/Mistral-7B-v0.1

Text Generation • 7B • Updated Jul 24 • 338k • 3.94k
microsoft/phi-2

Text Generation • 3B • Updated Apr 29, 2024 • 767k • 3.39k
mistralai/Mistral-7B-Instruct-v0.2

Text Generation • 7B • Updated Jul 24 • 500k • • 2.93k
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 52
nvidia/Mistral-NeMo-Minitron-8B-Base

Text Generation • 8B • Updated Aug 22, 2024 • 3.98k • 176

Upvote

Collection guide
Browse collections

LLM

Anycoder

Qwen2.5 Coder Artifacts

QwQ-32B-Preview

Open LLM Leaderboard

InternVL

Qwen2.5

Qwen2.5 Coder Artifacts

Vision Arena (Testing VLMs side-by-side)

Qwen2-VL-7B