AniMemory-alpha-dpo / README.md
devancao
track files
3139399

Gallery

Animemory Alpha is a bilingual model primarily focused on anime-style image generation. It utilizes a SDXL-type Unet structure and a self-developed bilingual T5-XXL text encoder, achieving good alignment between Chinese and English. We first developed our general model using billion-level data and then tuned the anime model through a series of post-training strategies and curated data. By open-sourcing the Alpha version, we hope to contribute to the development of the anime community, and we greatly value any feedback.

Key Features

  • Good bilingual prompt following, effectively transforming certain Chinese concepts into anime style.
  • The model is mainly にじげん(二次元) style, supporting common artistic styles and Chinese elements.
  • Competitive image quality, especially in generating detailed characters and landscapes.
  • Prediction mode is x-prediction, so the model tends to produce subjects with cleaner backgrounds; more detailed prompts can further refine your images.
  • Impressive creative ability, the more detailed the descriptions are, the more surprises it can produce.
  • Embracing open-source co-construction; we welcome anime fans to join our ecosystem and share your creative ideas through our workflow.
  • Better support for Chinese-style elements.
  • Compatible with both tag lists and natural language description-style prompts.
  • Centered on a resolution of 1024, e.g. 896 * 1152 for vertical image output.

Model Info

Developed by animEEEmpire
Model Name AniMemory-alpha
Model type Diffusion-based text-to-image generative model
Download link Hugging Face
Parameter TextEncoder_1: 5.6B
TextEncoder_2: 950M
Unet: 3.1B
VAE: 271M
Context Length 227
Resolution Multi-resolution

Key Problems and Notes

  • Primarily focuses on text-following ability and basic image quality; it is not a strongly artistic or stylized version, making it suitable for open-source co-construction.
  • Quantization and distillation are still in progress, leaving room for significant speed improvements and GPU memory savings. We are planning for this and looking forward to volunteers.
  • A relatively complete data filtering and cleaning process has been conducted, so it is not adept at pornographic generation; any attempts to force it may result in image crashes.
  • Simple descriptions tend to produce images with simple backgrounds and chibi-style illustrations; you can try to enhance the detail by providing comprehensive descriptions.
  • For close-up shots, please use descriptions like "detailed face", "close-up view" etc. to enhance the impact of the output.
  • Adding necessary quality descriptors can sometimes improve the overall quality.
  • The issue with small faces still exists in the Alpha version, but it has been slightly improved; feel free to try it out.
  • It is better to detail a single object rather than too many objects in one prompt.

Limitations

  • Although the model data has undergone extensive cleaning, there may still be potential gender, ethnic, or political biases.
  • The model's open-sourcing is dedicated to enriching the ecosystem of the anime community and benefiting anime fans.
  • The usage of the model shall not infringe upon the legal rights and interests of designers and creators.

Quick Start

1.Install the necessary requirements.

  • Recommended Python >= 3.10, PyTorch >= 2.3, CUDA >= 12.1.

  • It is recommended to use Anaconda to create a new environment (Python >= 3.10) conda create -n animemory python=3.10 -y to run the following example.

  • run pip install git+https://github.com/huggingface/diffusers.git torch==2.3.1 transformers==4.43.0 accelerate==0.31.0 sentencepiece

2.ComfyUI inference.

Go to ComfyUI-Animemory-Loader for comfyui configuration.

3.Diffusers inference.

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("animEEEmpire/AniMemory-alpha", trust_remote_code=True, torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "一只凶恶的狼,猩红的眼神,在午夜咆哮,月光皎洁"
negative_prompt = "nsfw, worst quality, low quality, normal quality, low resolution, monochrome, blurry, wrong, Mutated hands and fingers, text, ugly faces, twisted, jpeg artifacts, watermark, low contrast, realistic"

images = pipe(prompt=prompt,
              negative_prompt=negative_prompt,
              num_inference_steps=40,
              height=1024, width=1024,
              guidance_scale=7,
              )[0]
images.save("output.png")
  • Use pipe.enable_sequential_cpu_offload() to offload the model into CPU for less GPU memory cost (about 14.25 G, compared to 25.67 G if CPU offload is not enabled), but the inference time will increase significantly(5.18s v.s. 17.74s on A100 40G).

4.For faster inference, please refer to our future work.

License

This repo is released under the Apache 2.0 License.