Google has introduced new AI-based diffusion models to improve the quality of low-resolution images. The two new diffusion models are — image super-resolution (SR3) and cascaded diffusion models (CDM). These models use AI to generate high fidelity images. These models have many applications that can range from restoring old family portraits and improving medical imaging systems to enhancing the performance of downstream models for image classification, segmentation, and more. The SR3 model transforms a low-resolution image into a detailed high-resolution image result. It also surpasses the current deep generative models like generative adversarial networks (GANs) in human evaluations.
Researchers from Google Research’s Brain Team have detailed both SR3 and CDM diffusion models. Let’s first discuss the SR3 model. SR3 is a super-resolution diffusion model that takes input as a low-resolution image and builds a corresponding high-resolution photo from pure noise. The model is trained on an image corruption process that adds noise to a high-resolution image until only pure noise remains. The SR3 model then reverses the process “beginning from pure noise and progressively removing noise to reach a target distribution through the guidance of the input low-resolution image.”
Google New AI-Based Tech Can Transform Poor Quality Photos Into High-Res Images
Google has shared a few impressive examples of a 64×64 pixels resolution image. These images are then converted into a 1,024×1,024 pixels resolution photo using SR3. The end result of a 1,024×1,024 pixels resolution output, especially those of face and natural images, is very impressive. The tech giant says that SR3 is able to achieve strong benchmark results on the super-resolution task for face and natural images when scaling to 4x to 8x higher resolutions.
On the other hand, CDM is a class-conditional diffusion model. This model is designed on ImageNet data to generate high-resolution natural images. Since ImageNet is a difficult, high-entropy dataset, Google built CDM as a cascade of multiple diffusion models. This cascade approach involves chaining together multiple generative models over several spatial resolutions. It has a diffusion model that generates data at a low resolution. There is also a sequence of SR3 super-resolution diffusion models that gradually increase the resolution of the generated image to the highest resolution.
We also know that cascading improves quality and training speed for high-resolution data. Google says it applies Gaussian noise and Gaussian blur to the low-resolution input image of each super-resolution model in the cascading pipeline. It calls this process conditioning augmentation and it enables better and higher resolution sample quality for CDM.
With SR3 and CDM, Google says it has
“pushed the performance of diffusion models to the state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks.”