Google's Gemini 2.5 Flash Image, Codenamed 'Nano Banana', Enhances AI Image Editing and Generation

Google has rolled out significant updates to its Gemini platform, introducing advanced native image editing and generation capabilities powered by the Gemini 2.5 Flash Image model, internally codenamed 'Nano Banana'. This enhancement aims to deliver superior consistency and adherence to user instructions, integrating directly into the Gemini app and developer tools. Tim Brooks, VP of Engineering for Google DeepMind, highlighted the new features, stating, "It has great consistency and adherence to instructions, and is a lot of fun to play with."

The Gemini 2.5 Flash Image model brings several key functionalities to the forefront. Users can now perform prompt-based editing, allowing natural language commands to modify images, such as changing backgrounds or removing objects. A core innovation is its character consistency feature, ensuring that subjects like people or pets maintain their likeness across multiple generated images or edits, which is crucial for storytelling and branding.

Google's updated image generation also includes multi-image fusion, enabling the combination of elements from various input images into a single, cohesive scene. This capability, alongside the model's native world knowledge, allows Gemini to interpret complex instructions and produce contextually accurate visuals. All images created or edited within the Gemini app will include both a visible and an invisible SynthID watermark, signifying their AI origin.

The 'Nano Banana' model is accessible to users globally through the Gemini app, and to developers via the Gemini API, Google AI Studio, and Vertex AI. This strategic integration positions Gemini as a versatile tool for creators and businesses, streamlining workflows that traditionally required extensive manual editing. Google aims to offer a balanced solution across speed, quality, and memory efficiency, competing directly with established models like DALL-E 3, Midjourney, and Stable Diffusion.

The new pricing structure for developers sees the model available at $30 per 1 million output tokens, with each generated image consuming approximately 1,290 output tokens, equating to about $0.039 per image. This update underscores Google's commitment to advancing multimodal AI and making sophisticated image creation and manipulation more accessible and intuitive for a broad user base.