Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
Google’s latest open source AI Gemma 3 model This is not the only big news in the alphabet subsidiary today.
No, in fact, the spotlights may have been stolen by Google gemini 2.0 flash with native image generationA new experimental model available for free for Google AI Studio users and developers via the Google Gemini API.
It marks the first time that a large American technological company has sent the generation of multimodal images directly within a model to consumers. Most of the other AI image generation tools were diffusion models (image specific) connected to large language models (LLM), requiring a little interpretation between two models to derive an image that the user required in a text prompt. This was both for the previous gemini LLMS of Google connected to its Imagen diffusion models, and to the current previous configuration (and always, as long as the knowledge) of Chatgpt and Various LLM connection under its Dall-E 3 diffusion model.
On the other hand, Gemini 2.0 Flash can generate images natively in the same model as the text of the guest types invites, theoretically allowing greater precision and more capacities – and the first indications are quite true.
Gemini 2.0 Flash, Unveiled for the first time in December 2024 But without the capacity for generation of native images activated for users, incorporates multimodal inputs, reasoning and understanding of natural language to generate images alongside the text.
The newly available experimental version, Gemini-2.0-Flash-Exp, allows developers to create illustrations, refine the images by conversation and generate detailed visuals according to global knowledge.
How Gemini 2.0 Flash improves the images generated by AI
In a Blog article oriented developers Published earlier in the day, Google highlights several key capacities of Gemini 2.0 Flash’s Native image generation:
• Text and image narration: Developers can use Gemini 2.0 Flash to generate illustrated stories while maintaining consistency in characters and parameters. The model also responds to the comments, allowing users to adjust history or modify the artistic style.
• Conversational image edition: AI supports Multi-tours editionThis means that users can refine an image by providing instructions through natural language prompts. This feature allows real -time collaboration and creative exploration.
• Global generation of knowledge -based images: Unlike many other image generation models, Gemini 2.0 Flash uses wider reasoning capacities to produce more relevant contextually. For example, it can illustrate recipes with detailed visuals that align with real world ingredients and cooking methods.
• Improved text rendering: Many models of IA images have difficulty generating readable text in images, often producing spelling mistakes or distorted characters. Google reports that Gemini 2.0 Flash surpasses the main competitors In the rendering of the text, making it particularly useful for advertisements, publications on social networks and invitations.
The first examples show incredible potential and promise
Googlers and certain Power AI users to share examples of the new generation of images and publishing capacities offered via Gemini 2.0 Flash Experimental, and they were undoubtedly impressive.
IA and technological educator Covered Paul pointed out that “you can mainly change any image in natural language [fire emoji[. Not only the ones you generate with Gemini 2.0 Flash but also existing ones,” showing how he uploaded photos and altered them using only text prompts.
Users @apolinario and @fofr showed how you could upload a headshot and modify it into totally different takes with new props like a bowl of spaghetti, or change the direction the subject was looking in while preserving their likeness with incredible accuracy, or even zoom out and generate a full body image based on nothing other than a headshot.

Google DeepMind researcher Robert Riachi showcased how the model can generate images in a pixel-art style and then create new ones in the same style based on text prompts.


AI news account TestingCatalog News reported on the rollout of Gemini 2.0 Flash Experimental’s multimodal capabilities, noting that Google is the first major lab to deploy this feature.

User @Angaisb_ aka “Angel” showed in a compelling example how a prompt to “add chocolate drizzle” modified an existing image of croissants in seconds — revealing Gemini 2.0 Flash’s fast and accurate image editing capabilities via simply chatting back and forth with the model.

YouTuber Theoretically Media pointed out that this incremental image editing without full regeneration is something the AI industry has long anticipated, demonstrating how it was easy to ask Gemini 2.0 Flash to edit an image to raise a character’s arm while preserving the entire rest of the image.

Former Googler turned AI YouTuber Bilawal Sidhu showed how the model colorizes black-and-white images, hinting at potential historical restoration or creative enhancement applications.

These early reactions suggest that developers and AI enthusiasts see Gemini 2.0 Flash as a highly flexible tool for iterative design, creative storytelling, and AI-assisted visual editing.
The swift rollout also contrasts with OpenAI’s GPT-4o, which previewed native image generation capabilities in May 2024 — nearly a year ago — but has yet to release the feature publicly—allowing Google to seize an opportunity to lead in multimodal AI deployment.
As user @chatgpt21 aka “Chris” pointed out on X, OpenAI has in this case “los[t] The year + advances ”on this ability for unknown reasons. The user invited anyone from Openai to comment on why.

My own tests revealed certain limits with the size of the appearance report – it seemed stuck in 1: 1 for me, despite the text of modifying it – but it was able to change the direction of the characters in an image in a few seconds.

Although a large part of the first discussions on the generation of native images of Gemini 2.0 Flash has focused on individual users and creative applications, its implications for business teams, developers and software architects are important.
Design and marketing powered by AI on a large scale: For marketing teams and content creators, Gemini 2.0 Flash could serve as a profitable alternative to traditional graphic design work flows, automating the creation of brand content, advertisements and social media visuals. Since it supports text rendering in images, it could rationalize the creation of ads, the design of packaging and promotional graphics, reducing dependence on manual edition.
Improved developer tools and AI workflow: for CTOs, CIOs and software engineers, native images generation could simplify the integration of AI into applications and services. By combining text and image outputs in a single model, Gemini 2.0 Flash allows developers to build:
- Design assistants fed by AI generate UI / UX models or App.
- Automated documentation tools that illustrate real -time concepts.
- Dynamic narration platforms focused on AI for media and education.
Since the model also supports the conversational image editing, the teams could develop AI -centered interfaces where users refine the conceptions by natural dialogue, lowering the barrier to the entrance for non -technical users.
New possibilities of productivity software focused on AI: For business teams that build productivity tools powered by AI, Gemini 2.0 Flash could support applications such as:
- Automated presentation generation with slides and visuals created by AI.
- Anotation of legal and commercial document with infographics generated by AI.
- Visualization of electronic commerce, dynamically generating products of products based on descriptions.
How to deploy and experience this ability
Developers can start testing gemini 2.0 flash generation of images using the Gemini API. Google provides an example of an API request to demonstrate how developers can generate illustrated stories with text and images in a single answer:
from google import genai
from google.genai import types
client = genai.Client(api_key="GEMINI_API_KEY")
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents=(
"Generate a story about a cute baby turtle in a 3D digital art style. "
"For each scene, generate an image."
),
config=types.GenerateContentConfig(
response_modalities=["Text", "Image"]
),
)
By simplifying the generation of images powered by AI, Gemini 2.0 Flash offers developers new ways of creating illustrated content, designing AI assisted applications and experimenting visual narration.