Abstract
A diffusion framework called PixelSmile is introduced that disentangles facial expression semantics through symmetric joint training and contrastive learning to enable precise, controllable, and fine-grained expression editing with robust identity preservation.
Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off between expression editing and identity preservation. We propose PixelSmile, a diffusion framework that disentangles expression semantics via fully symmetric joint training. PixelSmile combines intensity supervision with contrastive learning to produce stronger and more distinguishable expressions, achieving precise and stable linear expression control through textual latent interpolation. Extensive experiments demonstrate that PixelSmile achieves superior disentanglement and robust identity preservation, confirming its effectiveness for continuous, controllable, and fine-grained expression editing, while naturally supporting smooth expression blending.
Community
๐ Project Page
๐ https://ammmob.github.io/PixelSmile/
๐ป GitHub Repo
๐ https://github.com/Ammmob/PixelSmile
๐ค Model (Hugging Face)
๐ https://huggingface.co/PixelSmile/PixelSmile
๐ Benchmark (FFE-Bench)
๐ https://huggingface.co/datasets/PixelSmile/FFE-Bench
๐ฎ Online Demo (Spaces)
๐ https://huggingface.co/spaces/PixelSmile/PixelSmile-Demo
Get this paper in your agent:
hf papers read 2603.25728 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 2
Collections including this paper 0
No Collection including this paper