FLIPNET: Unlocking the Potential of Diffusion Priors in Blind Face Restoration

1University of Warwick, 2Imperial College London 3University of Sheffield 4University of Surrey
📣 Accepted at ICCV 2025

Abstract

poster

Although diffusion prior is rising as a powerful solution for blind face restoration (BFR), the inherent gap between the vanilla diffusion model and BFR settings hinders its seamless adaptation. The gap mainly stems from the discrepancy between 1) high-quality (HQ) and low-quality (LQ) images and 2) synthesized and real-world images. The vanilla diffusion model is trained on images with no or less degradations, whereas BFR handles moderately to severely degraded images. Additionally, LQ images used for training are synthesized by a naive degradation model with limited degradation patterns, which fails to simulate complex and unknown degradations in real-world scenarios. In this work, we use a unified network FLIPNET that switches between two modes to resolve specific gaps. In restoration mode, the model gradually integrates BFR-oriented features and face embeddings from LQ images to achieve authentic and faithful face restoration. In degradation mode, the model synthesizes real-world like degraded images based on the knowledge learned from real-world degradation datasets. Extensive evaluations on benchmark datasets show that our model 1) outperforms previous diffusion prior based BFR methods in terms of authenticity and fidelity, and 2) outperforms the naive degradation model in modeling the real-world degradations.

Method

pipeline
Overview of our model structure. FLIPNET takes noisy high-quality (HQ) and low-quality (LQ) image pairs as input. It can switch between restoration mode and degradation mode. Taking restoration mode as an example, HQ images are used as input while LQ images are condition, where a BoostHub is placed in parallel to each self-attention layer to selectively integrate the BFR-oriented LQ features for denoising. Additionally, LoRA weights are plugged to all self-attention and cross-attention layers to adapt the base model to face domain. By simply flipping the image-condition order, it switches to degradation mode to synthesize degraded data with real-world degradations.

Results

We provide the synthetic CelebA-Test used in our paper and our restoration results. Please download to check more results.

Interpolate start reference image.
Comparison with state-of-the-art methods on CelebA-Test.

Interpolate start reference image.
Comparison with state-of-the-art methods on Real-world datasets.

BibTeX

@article{miao2025flipnet,
          author    = {Miao, Yunqi and Qu, Zhiyu and Gao, mingqi and Chen, Changrui and Song, jifei and Han, Jungong and Deng, Jiankang},
          title     = {Unlocking the Potential of Diffusion Priors in Blind Face Restoration},
          journal   = {ICCV},
          year      = {2025}
        }