kiran's picture

kiran

kira

·

ki6an

AI & ML interests

agi

Recent Activity

liked a model 3 days ago

THUDM/GLM-4-32B-0414

liked a dataset 7 days ago

LLM360/MegaMath

liked a dataset 8 days ago

Magpie-Align/Magpie-Qwen2.5-Pro-300K-Filtered

View all activity

Organizations

kira's activity

upvoted a collection 12 days ago

Llama 4

Llama 4 release • 10 items • Updated 13 days ago • 436

upvoted a paper 3 months ago

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23 • 48

upvoted 2 collections 5 months ago

xLAM models

xLAM: A Family of Large Action Models to Empower AI Agent Systems: https://github.com/SalesforceAIResearch/xLAM • 21 items • Updated about 4 hours ago • 49

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 306

upvoted a collection 6 months ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated Feb 20 • 252

upvoted 2 collections 9 months ago

Mini Pretrain Datasets

9 items • Updated Jul 9, 2024 • 9

Useful Pretrain-Datasets

pretrain-datasets with (maybe) good quality • 21 items • Updated Mar 12 • 1

upvoted a collection 11 months ago

Yi-1.5 (2024/05)

10 items • Updated May 20, 2024 • 92

upvoted a collection about 1 year ago

GPT-4 generated datasets

Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16, 2024 • 10

upvoted a paper about 1 year ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 68

upvoted 4 papers over 1 year ago

Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 24

Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13, 2024 • 16

Scalable Pre-training of Large Autoregressive Image Models

Paper • 2401.08541 • Published Jan 16, 2024 • 39

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 27

upvoted a collection over 1 year ago

Papers about model merging

referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13, 2024 • 14

upvoted 3 papers over 1 year ago

CogVLM: Visual Expert for Pretrained Language Models

Paper • 2311.03079 • Published Nov 6, 2023 • 27

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 18

One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 33

upvoted 2 papers almost 2 years ago

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Paper • 2307.02628 • Published Jul 5, 2023 • 10

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Paper • 2306.17107 • Published Jun 29, 2023 • 11