Evados/DiffSynth-Studio-Lora-Wan2.1-ComfyUI

NEW:

New video for the wan 2.1 I2V v1.3b fun inp :

https://www.youtube.com/watch?v=bXUYkfybOCE

I’ve added new models for Wan 2.1 I2V v1.3b Fun Inp. These models don’t use exactly the same method as my DG T2V Models, and the results are not identical. In this version, the effect is reduced, but it attempts to fix certain parts of the video.

It’s harder to implement the method I use with the other models because the I2V system and its method are constantly trying to match the start and end images. It’s similar to using a LoRA to change faces in a video—you can see the model struggling against the LoRA to preserve the original start and end frames. Often, the LoRA fails to work correctly on the first or last frame.

Keep in mind that all of this is still experimental. I’ve noticed that sometimes the tea cache conflicts with my boosted model. This happens because the effects are applied before the tea cache. I’m not sure how to fix it yet—maybe it needs a different tea cache value to work properly.

If you notice strange noise in the video, try disabling the tea cache or experimenting with different tea cache values until you find one that works.

To install my custom node, download the ZIP file and extract it. Make sure that there is only one folder named ComfyUI-DG-Wan2_1-OX3D, sometimes unzipping creates two nested folders with the same name, so double-check that. Place this folder inside ComfyUI/custom_nodes. Next, open a Windows command prompt (preferably as administrator), and run the following command:

X:\YourComfyUiInstallDir\python_embeded\python -m pip install -r X:\YourComfyUiInstallDir\ComfyUI\custom_nodes\ComfyUI-DG-Wan2_1-OX3D\requirements.txt

After that, restart ComfyUI and download my workflow. Everything should work fine. My node allows you to use one or two images with the Fun Inp model, and it also adds three transition modes between the start and end images. Additionally, it supports using one or two prompts to mix the two images.

NEW MODELS: That’s a lot of models, but it gives plenty of choice. I tried to fix a few mistakes I had made in the earlier versions of my models. One of the main issues was that in the previous versions, video motion was significantly reduced. Overall, the results were quite good, but with less motion.

When trying to maintain more consistency, it ended up reducing motion. In my new models, I tried using a different technique that shouldn't reduce motion. I also tried, in certain versions like the "stock" one, to preserve as much of the original model’s rendering style as possible.

Of course, the results will vary due to the boost, but the stock version should be the closest to the original. That said, for character quality or certain details—especially interiors—the other models often produce much better results.

The models are configured for 5 or 6 steps by default, but you can definitely go higher. If the video output becomes noisy or strange and you're using Tea Cache, try disabling it. With certain step counts, it can conflict with Tea Cache’s default values.

All of my boosted models are compatible with the Diffusion Pipe training tool. If you like to install Diffusion Pipe you need WSL because some optimisations is not 100% compatible with windows, and the training script using it as a default but it is very fast too. Just for training simple faces exemple around 20 30 images, you only need 6 to 12 epochs and it take around 20 30 mins on a 4070ti super.

https://github.com/tdrussell/diffusion-pipe

so you can use them to train your own LoRAs. It’s a lot of models, and this should be my final version. I don’t think I’ll be making more versions of this model.

At some point, I might remove a few and keep only the best ones. Unless, of course, a better video model comes out someday—but for now, I really love this one. There’s a lot of fun to be had with it.

NEW TRAINING LORAs: I added a special LoRA called dg_wan2_1_v1_3b_lora_extra_noise_detail_motion.safetensors that I created. The LoRA was trained on over 10,000 images, but it's not trained to reproduce them graphically, it's trained to replicate their initial noise patterns instead. This LoRA is useful for adding a bit more realism and detail, and it also introduces motion. It should be used with relatively low strength. Between 0.01 and 0.35, it works very well with the T2V model. I haven’t had time to test it with other models yet.

I’ve converted Wan2.1-Fun-1.3B-InP-HPS2.1_lora.safetensors & Wan2.1-Fun-1.3B-InP-MPS_lora_new.safetensors to make it compatible with the Fun model and ComfyUI. I'm working on a Fun model version using my boost method, but with this model, the effect isn’t exactly the same as the T2V model. The boost does help, but the impact is noticeably less compared to the T2V model. I still need to run more tests with my Fun Boost model, but I should be able to upload it soon.

OLDER: I have added some experimental versions of the model Wan 2.1 v1.3b. These are different levels of distilled + hires + refined editions. Normally, the medium models should be the best, but this is experimental, and I haven't had time to test every situation. However, the first results look promising.

You can generate videos with 4, 5, 6, 8, 10, or more steps. This is my first version, and if I notice any issues, I will try to fix them later. From my tests, it works well, but as I mentioned before, I haven't tested every possible situation.

You can find a workflow for ComfyUI to test the models if you're interested.

IMPORTANT: If you use a higher step count, try using another sampler like euler and try different scheduler too. You may also need to increase the CFG with a higher step count. Generally, the animation is better with more steps, but it also takes more time. With a lower step count, the animation can be a bit more random. If the video colors are too intense, try reducing the CFG or using a lower version of the model. With example of 20 steps or more, it is better to use the lower model version and adjust the CFG to correct the color. Remember, these models are modified and do not behave like the original ones.

If anyone wants to add sounds or voices, come back a bit later, I will provide the workflows to do so.

Video Exemple: https://youtu.be/kfokkXEGByU

Notice: I see that this works pretty well up close, but there are some blur issues with the background. I'll try to fix that by the end of the week, so I'll definitely need to update some models—or maybe reduce everything to just three or four models. This is an experimental test version, and it provides a big boost, but after some testing, I noticed that the background is blurrier than the original, and I think I know why. However, to fix it, I'll have to redo the models. Sorry for the inconvenience! I hope people can still have fun with this test version in the meantime.

This weekend, I’m going to create some workflows to use the original model and achieve good quality. I’ll also add workflows for sound and voices. I will likely fix and update the models as well. So if there’s a model you like among those I’ve uploaded, make sure to save it, as I’ll be deleting them over the weekend to update them with new versions along with the other workflows. I’ll have a bit more time to fix and test everything further to refine it as much as possible. Until then, have fun with these experimental versions!

LoRAs for Wan-AI/Wan2.1-T2V-1.3B

The LoRA comes from the DiffSynth-Studio team.

https://github.com/modelscope/DiffSynth-Studio

I have only converted the LoRAs to make them compatible with ComfyUI. The original LoRA model comes from DiffSynth-Studio on ModelScope.

https://modelscope.cn/organization/DiffSynth-Studio

Wan2.1-1.3b-lora-aesthetics-v1:

Model Overview

This LoRA model is trained based on the Wan2.1-1.3B model using the DiffSynth-Studio framework. It has been finely tuned on an aesthetics-focused dataset, enhancing the visual appeal of generated videos. Additionally, the classifier-free guidance can be disabled to speed up the process.

Recommended Settings cfg_scale = 1

sigma_shift = 10

Note Using this model may reduce the diversity of generated videos. It is recommended to adjust the lora_alpha value to fine-tune the LoRA’s impact on the final output.

Wan2.1-1.3b-lora-speedcontrol-v1:

Model Overview

This LoRA model is trained from the Wan2.1-1.3B model using the DiffSynth-Studio framework. It has been fine-tuned on an aesthetics-focused dataset, enhancing the visual appearance of generated videos. Additionally, the classifier-free guidance can be disabled to speed up the process.

Recommended parameters:

cfg_scale = 1

sigma_shift = 10

Note: After using this model, the diversity of generated videos may decrease. It is recommended to adjust the lora_alpha value to control the impact of the LoRA on the final output.

Wan2.1-1.3b-lora-speedcontrol-v1: This LoRA model is based on the Wan2.1-1.3B model and has been trained using the DiffSynth-Studio framework. It allows control over the speed of generated videos by adjusting the LoRA alpha parameter.

LoRA alpha > 0: Use the low speed trigger → Slower speed, improved image quality.

LoRA alpha < 0: Use the high speed trigger → Faster speed, reduced image quality.

Currently, the effects of this model are not yet fully stable, and optimizations are still in progress.

Model Results Prompt Used: "A documentary-style photograph: a lively little white dog runs swiftly across a lush green lawn. Its fur is bright white, its two ears stand upright, and it has a focused yet joyful expression. Sunlight illuminates its coat, making it look particularly soft and shiny. In the background, a vast meadow scattered with a few wildflowers stretches toward the horizon, where a blue sky with a few white clouds can be seen. The perspective is dynamic, capturing the dog's movement and the surrounding energy of the grass. Side view, medium shot, moving camera."

Negative Prompt: Overly bright colors, overexposure, static, blurry details, subtitles, artistic style, painting, still image, grayish tint, very poor quality, low quality, JPEG compression artifacts, ugly, deformed, extra fingers, poorly drawn hands, distorted faces, disfigured, malformed limbs, fused fingers, static scene, cluttered background, three legs, crowd in the background, characters walking upside down.

Effect of LoRA Alpha Parameter: LoRA alpha = 0.7 → Slower speed, better visual quality.

LoRA alpha = 0 → Normal speed, neutral effect.

LoRA alpha = -0.5 → Faster speed, reduced visual quality.

Wan2.1-1.3b-lora-highresfix-v1:

Model Overview

This LoRA model is trained based on the Wan2.1-1.3B model using the DiffSynth-Studio framework. Since the base model was originally trained at 480p resolution, it has certain limitations in sharpness. To address this, additional training was conducted to enhance the quality of high-resolution videos, preventing image collapse or a dull appearance.

Recommended Usage Directly generate short high-resolution videos:

Set the resolution to 1024 × 1024 while slightly reducing the number of frames to avoid excessively long generation times.

Refine details in a high-resolution video:

First, generate a low-resolution video.

Apply upscaling to increase resolution.

Finally, use this model to enhance visual details.

Model Effects Anime / 2D Style Prompt: Anime style, an adorable 2D-style girl with short black hair flowing in the wind, gently turning her head.

Negative Prompt: Overly bright colors, overexposure, static, blurry details, subtitles, artistic style, painting, still image, dull overall tone, poor quality, visible JPEG compression, ugly, incomplete, extra fingers, poorly drawn hands, malformed face, deformed, disfigured, distorted limbs, fused fingers, static image, cluttered background, three legs, crowd in the background, walking upside down.

Before activating LoRA → After activating LoRA

Sword and Magic Prompt: An ancient mythology scene depicting a battle between a hero and a dragon, with steep cliffs in the background. The hero wears armor, wields a shining sword, and the dragon spreads its massive wings, ready to unleash flames.

Wan2.1-1.3b-lora-exvideo-v1:

Model Overview

This LoRA model is trained based on the Wan2.1-1.3B model using the DiffSynth-Studio framework. It enables video duration extension: once activated, this LoRA allows the generation of videos twice as long as usual.

Recommended Settings num_frames = 161

lora_alpha = 1.0

Model Effects 📷 Documentary Photography Style Prompt: A playful little dog wearing black sunglasses runs swiftly across a green lawn. Its fur is light brown, its ears perked up, and its expression is focused yet joyful. The sunlight highlights its fur, making it appear particularly soft and shiny. In the background, a vast meadow dotted with a few wildflowers stretches under a blue sky with scattered white clouds. The perspective is dynamic, capturing the motion of the dog's run and the vibrancy of the surrounding landscape. Side-moving camera, medium shot.

🎨 High-Definition 3D Texture Prompt: A small white cat sprints forward on a 10-meter-high platform, then performs a backflip dive into the water. Its fur is silky, its gaze sharp, and its movements fluid and natural. In the background, a pristine blue swimming pool with a smooth and calm surface. At the moment of the jump, a spotlight from above illuminates the cat, creating a striking contrast between light and shadow. The water splashes are sharp and precise, producing a visually spectacular effect. C4D rendering, dynamic close-up.

🎭 Japanese Anime Style Prompt: On a city street corner, a black cat crouches under a lamppost, gazing into the distance at the neon lights. Suddenly, a blue light beam descends from the sky, swiftly enveloping its body. The cat begins to levitate, its black fur slowly dissolving into the air as its body elongates. Its fur transforms into a sleek black suit, revealing a slender silhouette. Its cat ears disappear, and its facial features become human, taking on the appearance of a handsome young man with a cold gaze. He lands lightly, his suit billowing slightly in the night breeze, as the blue light fades away—an elegant and mysterious young man from the future.

🌆 Wide-Angle City Scene Prompt: The camera provides an overview of a bustling city street. On a wide sidewalk, pedestrians move about, creating a lively and dynamic urban tableau.

Usage Instructions This LoRA model is designed to extend video duration while maintaining visual quality. For optimal results, set num_frames to 161 or adjust it according to your needs.

Evados
/

DiffSynth-Studio-Lora-Wan2.1-ComfyUI

Model tree for Evados/DiffSynth-Studio-Lora-Wan2.1-ComfyUI