Vchitect-XL

non-profit

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

weepiess2383 authored a paper 14 days ago

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

weepiess2383 authored a paper 14 days ago

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

ChenyangSi authored a paper 21 days ago

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

View all activity

Vchitect-XL's activity

weepiess2383

authored 2 papers 14 days ago

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Paper • 2501.08453 • Published Jan 14 • 1

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

Paper • 2503.18886 • Published 15 days ago • 20

ChenyangSi

authored a paper 21 days ago

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

Paper • 2503.11495 • Published 25 days ago • 11

sayakpaul

authored a paper 25 days ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Paper • 2503.09641 • Published 27 days ago • 34

sayakpaul

posted an update about 2 months ago

Post

3414

Inference-time scaling meets Flux.1-Dev (and others) 🔥

Presenting a simple re-implementation of "Inference-time scaling diffusion models beyond denoising steps" by Ma et al.

I did the simplest random search strategy, but results can potentially be improved with better-guided search methods.

Supports Gemini 2 Flash & Qwen2.5 as verifiers for "LLMGrading" 🤗

The steps are simple:

For each round:

1> Starting by sampling 2 starting noises with different seeds.
2> Score the generations w.r.t a metric.
3> Obtain the best generation from the current round.

If you have more compute budget, go to the next search round. Scale the noise pool (2 ** search_round) and repeat 1 - 3.

This constitutes the random search method as done in the paper by Google DeepMind.

Code, more results, and a bunch of other stuff are in the repository. Check it out here: https://github.com/sayakpaul/tt-scale-flux/ 🤗

sayakpaul

posted an update 2 months ago

Post

2054

We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets:

finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py

1 reply

sayakpaul

posted an update 2 months ago

Post

2004

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

weepiess2383

authored a paper 3 months ago

RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published Jan 15 • 15

ChenyangSi

authored 2 papers 3 months ago

RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published Jan 15 • 15

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

Paper • 2501.03847 • Published Jan 7 • 23

sayakpaul

posted an update 4 months ago

Post

4408

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

akhaliq

posted an update 4 months ago

Post

15978

Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: https://huggingface.co/spaces/akhaliq/anychat

3 replies

sayakpaul

posted an update 4 months ago

Post

2232

In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.

1 reply

sayakpaul

posted an update 4 months ago

Post

2163

Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences

7 replies

sayakpaul

posted an update 4 months ago

Post

2192

The Control family of Flux from @black-forest-labs should be discussed more!

It enables structural controls like ControlNets while being significantly less expensive to run!

So, we're working on a Control LoRA training script 🤗

It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130

sayakpaul

authored a paper 4 months ago

A Noise is Worth Diffusion Guidance

Paper • 2412.03895 • Published Dec 5, 2024 • 31

sayakpaul

posted an update 4 months ago

Post

1552

Let 2024 be the year of video model fine-tunes!

Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1

akhaliq

posted an update 4 months ago

Post

16137

QwQ-32B-Preview is now available in anychat

A reasoning model that is competitive with OpenAI o1-mini and o1-preview

try it out: https://huggingface.co/spaces/akhaliq/anychat

1 reply

akhaliq

posted an update 4 months ago

Post

4362

New model drop in anychat

allenai/Llama-3.1-Tulu-3-8B is now available

try it here: https://huggingface.co/spaces/akhaliq/anychat

akhaliq

posted an update 4 months ago

Post

3257

anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: https://huggingface.co/spaces/akhaliq/anychat

AI & ML interests

Recent Activity

Team members 7

Vchitect-XL's activity