Stability AI introduces Adversarial Diffusion Distillation (ADD), a pioneering technique to enhance real-time image synthesis

TL;DR:

  • Stability AI’s Adversarial Diffusion Distillation (ADD) revolutionizes image synthesis.
  • ADD reduces the inference steps for pretrained diffusion models to 1-4, enhancing speed and quality.
  • It combines adversarial training and score distillation for superior results.
  • ADD-XL surpasses SDXL-Base at 5122 px resolution with four sampling steps.
  • ADD excels in handling complex image compositions with high realism.
  • ADD outperforms LCM, LCM-XL, and single-step GANs, setting new benchmarks.

Main AI News:

In the realm of generative modeling, diffusion models (DMs) have emerged as game-changers, driving advancements in the creation of top-tier images and videos. Their scalability and iterative nature have empowered them to perform intricate tasks, such as generating images from unstructured textual prompts. Yet, the multitude of steps required for this iterative inference process has impeded DMs’ seamless real-time utilization. On the flip side, Generative Adversarial Networks (GANs) offer a swifter, single-step solution but often fall short of delivering the same sample quality as DMs, even with efforts to harness vast datasets.

Enter Stability AI, where a team of researchers has embarked on a mission to meld the rapidity of GANs with the superior sample quality of DMs. Their approach is elegantly simple: introducing Adversarial Diffusion Distillation (ADD), a versatile technique poised to uphold sampling fidelity while potentially elevating the overall model performance. ADD achieves this by reducing the number of inference steps required for a pretrained diffusion model to a mere 1-4 sampling steps. The research team unites two training objectives: first, a distillation loss akin to score distillation sampling (SDS), and second, an adversarial loss.

With each forward pass, the adversarial loss propels the model to generate samples that seamlessly align with the manifold of real-world images, eliminating common artifacts like blurriness that plague other distillation methods. To retain the rich compositional capabilities exhibited by larger DMs and maximize the wealth of knowledge from the pretrained DM, the distillation loss leverages another pretrained DM, held fixed as a teacher. Notably, their method minimizes memory demands by forgoing classifier-free guidance during inference. What sets this approach apart from earlier one-step GAN-based techniques is its capacity for iterative model refinement and superior results.

In summary, Stability AI introduces ADD, a groundbreaking technique that streamlines pretrained diffusion models into high-fidelity, real-time image generators with a mere 1-4 sampling steps. The research team meticulously considered various design choices for their unique methodology, seamlessly intertwining adversarial training with score distillation.

Here are the key contributions of their research: 

  • The introduction of ADD, a technique that achieves remarkable results with just 1-4 sampling steps, transforming pretrained diffusion models into high-quality, real-time image generators. 
  • ADD-XL’s outstanding performance, surpassing its teacher model SDXL-Base, particularly at a resolution of 5122 pixels with four sampling steps. 
  • ADD’s remarkable capability to handle intricate image compositions while preserving a high degree of realism in a single inference step. 
  • ADD’s substantial superiority over formidable benchmarks such as LCM, LCM-XL, and single-step GANs in terms of performance and quality.

Conclusion:

Stability AI’s Adversarial Diffusion Distillation (ADD) presents a game-changing advancement in real-time image synthesis. With its ability to dramatically reduce inference steps while maintaining superior image quality, ADD has the potential to revolutionize markets that rely on high-fidelity, on-the-fly image generation, such as content creation, design, and entertainment. Its superiority over existing methods positions Stability AI as a key player in the evolving landscape of generative modeling.

Source