“Whisk,” Google’s most recent artificial intelligence technology, allows users to upload photographs and receive a merged image that the AI has created. This is possible even without text input from users.
Users can submit photographs that illustrate subjects, settings, and styles, which Whisk will then integrate into a single image.
Google’s blog post describes Whisk as a “creative tool” that offers instant inspiration, not a “traditional image editor.” By its very nature, Whisk aims to be an enjoyable artificial intelligence function, not a refined professional work.
Large technology companies, such as Google and OpenAI, are rushing to deliver consumer goods that may demonstrate the applications of cutting-edge new technology. This is occurring despite critics warning that the lack of safeguards surrounding the development of artificial intelligence poses risks to humankind.
Since the original debut of OpenAI’s text-to-image production tool, Dall-E, in 2021, the idea of artificial intelligence-generated artwork has become a focal point of consumer products and has dominated social media. Google developed Whisk, an image-to-image generator, based on the widely-used concept of text-to-image generators.
Individuals that use Whisk have the ability to “remix” the final image by modifying their inputs and combining the categories in order to generate a variety of distinctive images, such as a plush toy, enamel pin, or sticker. It is not necessary for users to include text in order to generate an image; nevertheless, they may do so if they wish to direct certain information.
Thomas Iljic, a director of product management at Google Labs, said in a statement that “Whisk allows users to remix a subject, scene, and style in new and creative ways.” “Whisk offers rapid visual exploration rather than pixel-perfect edits,” Iljic said.
Google’s Whisk is based on generative artificial intelligence (AI), which DeepMind, an AI lab that Google acquired in 2014, developed.
Utilizing Google’s core artificial intelligence service, Gemini, which was introduced in December 2023, and combining it with Imagen 3, the most recent text-to-image generator that DeepMind published in December, Whisk is able to perform its functions.
Gemini generates a caption for each image users upload, which Imagen 3 then uses. The method captures the “essence” of the subject, allowing for remixes of the resulting image, but it may deviate from the prompt.
For instance, a Google blog post states that the generated image may differ from the photographs used as prompts in terms of height, hairstyle, or skin tone.
Google faced initial criticism when it first released Gemini’s text-to-image generator in February because the tool consistently produced incorrect images throughout history.
According to the business, Whisk is now in its preliminary phases of development and will initially be accessible to consumers in the United States through a website hosted on Google Labs.
In addition, OpenAI has only recently developed a text-to-video generator known as Sora, which brings attention to the rivalry for consumer goods.
Dan Ives, the managing director and senior equities analyst at Wedbush Securities, delivered a speech this morning, characterizing it as another significant milestone in Google’s pursuit of artificial intelligence and technology.
Ives stated that “DeepMind is a key asset for Google,” and he also mentioned that artificial intelligence products are a part of Google’s “treasure chest” of new products for the year 2025. Google collaborated with Samsung and Qualcomm to develop a new Android operating system as part of these new products.