flozi00 (Florian Zimmermeister)

upvoted a collection 14 days ago

Multilingual LLM Evaluation

Collection

Multilingual Evaluation Benchmarks • 8 items • Updated 15 days ago • 25

upvoted an article 14 days ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.17k

upvoted 3 papers 20 days ago

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Paper • 2502.17055 • Published 22 days ago • 16

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published 21 days ago • 70

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published 21 days ago • 53

upvoted a paper 28 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published about 1 month ago • 144

upvoted 2 papers 29 days ago

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper • 2502.09604 • Published Feb 13 • 33

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 143

upvoted a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 206

upvoted a paper 2 months ago

Scaling Laws for Floating Point Quantization Training

Paper • 2501.02423 • Published Jan 5 • 26

upvoted a paper 3 months ago

Transformers Can Navigate Mazes With Multi-Step Prediction

Paper • 2412.05117 • Published Dec 6, 2024 • 5

upvoted a paper 4 months ago

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4, 2024 • 49

upvoted an article 5 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 226

upvoted a paper 5 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

upvoted an article 5 months ago

Article

Welcome, Gradio 5

Oct 9, 2024

• 128

Florian Zimmermeister PRO

AI & ML interests

Organizations

flozi00's activity

Multilingual LLM Evaluation

Open-source DeepResearch – Freeing our search agents

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Scaling Laws for Floating Point Quantization Training

Transformers Can Navigate Mazes With Multi-Step Prediction

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Differential Transformer

Welcome, Gradio 5