CCMat (Mathieu Jouffroy)

upvoted an article 2 months ago

Article

Training Design for Text-to-Image Models: Lessons from Ablations

Photoroom

•

Feb 3

• 73

upvoted 3 articles 6 months ago

Article

We’re open-sourcing our text-to-image model and the process behind it

Photoroom

•

Nov 12, 2025

• 99

Article

Text-to-image Architectural Experiments

Photoroom

•

Nov 13, 2025

• 57

Article

You could have designed state of the art positional encoding

FL33TW00D-HF

•

Nov 25, 2024

• 478

upvoted 2 papers 7 months ago

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 170

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 514

upvoted a collection 8 months ago

DINOv3

Collection

DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 15 items • Updated Mar 10 • 634

upvoted 4 papers about 1 year ago

Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 86

DINOv2: Learning Robust Visual Features without Supervision

Paper • 2304.07193 • Published Apr 14, 2023 • 9

Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Paper • 2502.11831 • Published Feb 17, 2025 • 20

Cluster and Predict Latents Patches for Improved Masked Image Modeling

Paper • 2502.08769 • Published Feb 12, 2025 • 5

upvoted a paper over 1 year ago

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published Feb 4, 2025 • 66

upvoted a collection over 1 year ago

PaliGemma 2 Release

Collection

Vision-Language Models available in multiple 3B, 10B and 28B variants. • 32 items • Updated Mar 12 • 152

upvoted 2 articles over 1 year ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

+1

eliebak, lvwerra, lewtun

•

Jan 28, 2025

• 889

Article

We now support VLMs in smolagents!

+1

m-ric, merve, albertvillanova

•

Jan 24, 2025

• 113

upvoted a paper over 1 year ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 449

upvoted 2 articles over 1 year ago

Article

Introducing smolagents: simple agents that write actions in code.

+1

m-ric, merve, thomwolf

•

Dec 31, 2024

• 1.19k

Article

Visualize and understand GPU memory in PyTorch

qgallouedec

•

Dec 24, 2024

• 270

upvoted 2 papers over 1 year ago

Pyramidal Flow Matching for Efficient Video Generative Modeling

Paper • 2410.05954 • Published Oct 8, 2024 • 40

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Paper • 2410.06940 • Published Oct 9, 2024 • 12

Mathieu Jouffroy

AI & ML interests

Organizations

Training Design for Text-to-Image Models: Lessons from Ablations

We’re open-sourcing our text-to-image model and the process behind it

Text-to-image Architectural Experiments

You could have designed state of the art positional encoding

Diffusion Transformers with Representation Autoencoders

Less is More: Recursive Reasoning with Tiny Networks

DINOv3

Vision Transformers Need Registers

DINOv2: Learning Robust Visual Features without Supervision

Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Cluster and Predict Latents Patches for Improved Masked Image Modeling

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

PaliGemma 2 Release

Open-R1: a fully open reproduction of DeepSeek-R1

We now support VLMs in smolagents!

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Introducing smolagents: simple agents that write actions in code.

Visualize and understand GPU memory in PyTorch

Pyramidal Flow Matching for Efficient Video Generative Modeling

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Mathieu Jouffroy

AI & ML interests

Organizations

CCMat's activity

Training Design for Text-to-Image Models: Lessons from Ablations

We’re open-sourcing our text-to-image model and the process behind it

Text-to-image Architectural Experiments

You could have designed state of the art positional encoding

Open-R1: a fully open reproduction of DeepSeek-R1

We now support VLMs in smolagents!

Introducing smolagents: simple agents that write actions in code.

Visualize and understand GPU memory in PyTorch