Google’s Latest Text-To-Image Al Generates Diverse range of images

Google has always been at the top of innovation and creativity as far as its products and services are concerned. The new trend in the market these days is Google’s latest Text-To-Image Al generators. The user feeds any text in the program which he likes and the program will automatically generate the best accurate pictures that match the description. The images are a collaboration of many different styles. Few of the images resemble an oil painting, few resemble water paint and few even resemble to CGI renders technique. Google’s latest Text-To-Image Al Generates beyond your imagination is the all-rounder and produces an unexpected high quality image.

Before this DALL-E, a creation by commercial Al lab openAl was the leader in the field before Google. Imagen has overthrown DALL-E in quality and output. Human indicators also rate Google Imagen more than other competitors.

According to Google’s Imagen is the unprecedented photorealism deep level of language understanding Imagen is an Al system that generates photorealistic images from the input text provided to it by the user. Imagen uses a large frozen T5-XXL encoder to encode the text given into embedding. The large pertained frozen text encoders are quite effective for the text-to-image task.  A conditional diffusion model maps the text embedding into 64*64 image. The diffusion model can even map the text into 256*256 and 1024*1024 image. Google has the point of view that scaling the concerned text encoder size is more essential than scaling the diffusion model size. It has introduced a new thresh holding diffusion model sampler which helps in use of very large classifier-free guidance weights. The new efficient U-Net architecture has been introduced in the system which is more memory efficient, converges faster and compute efficient. Imagen attains a new state-of-art COCO FID of 7.27 which gives samples that are on par with the reference images in terms of image-text alignment.

Google is not making the Imagen public yet because it thinks the text to image has a creative potential and side by side it also has potential to spread inhumanity in the society by spreading fake news and harassment through these images. Let’s see when it makes it public after sorting out the shortcomings.

Also Read: Google celebrates 15 years of Street View

Fizza Atique

Fizza Atique is a Tech writer specializing in the intersection of tech and culture. She likes photography, VR, electronic music, coffee, and baking.
Back to top button
>
×