21 16

Timothe Laborie

timothelaborie

AI & ML interests

Recent Activity

commented on a paper 2 months ago

TransMLA: Multi-head Latent Attention Is All You Need

upvoted a paper 3 months ago

Optimizing Large Language Model Training Using FP4 Quantization

upvoted a paper 3 months ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

View all activity

Organizations

timothelaborie's activity

commented a paper 2 months ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11 • 49 •

upvoted 2 papers 3 months ago

Optimizing Large Language Model Training Using FP4 Quantization

Paper • 2501.17116 • Published Jan 28 • 37

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 275

upvoted a paper 5 months ago

Cautious Optimizers: Improving Training with One Line of Code

Paper • 2411.16085 • Published Nov 25, 2024 • 21

commented a paper 5 months ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 68 •

upvoted a paper 5 months ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 68

upvoted a paper 6 months ago

FlatQuant: Flatness Matters for LLM Quantization

Paper • 2410.09426 • Published Oct 12, 2024 • 15

New activity in huggingface/HuggingDiscussions 6 months ago

[FEEDBACK] Daily Papers

127

#32 opened 10 months ago by

kramp

upvoted a paper 6 months ago

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Paper • 2410.01131 • Published Oct 1, 2024 • 10

commented a paper 6 months ago

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 150 •

upvoted a paper 6 months ago

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 150

New activity in mistralai/Mistral-7B-v0.1 6 months ago

Fine Tuning for Classification

#129 opened about 1 year ago by

MUHAMMAD-SOHAIL-ZZU

upvoted a paper 9 months ago

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Paper • 2407.10969 • Published Jul 15, 2024 • 23

updated a model 9 months ago

timothelaborie/tweetclassifier

Updated Jul 14, 2024

upvoted 2 papers 11 months ago

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Paper • 2406.01574 • Published Jun 3, 2024 • 47

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 68

commented a paper 11 months ago

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 68 •

commented a paper 12 months ago

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Paper • 2404.18911 • Published Apr 29, 2024 • 31 •

New activity in 1bitLLM/bitnet_b1_58-3B about 1 year ago

Why are these models fp32?

#2 opened about 1 year ago by

supercharge19

commented a paper about 1 year ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 613 •

142