Skip to content

Instantly share code, notes, and snippets.

@tanaikech
Created June 16, 2025 01:38
Show Gist options
  • Save tanaikech/d3057587c22228503d313cea74a33e56 to your computer and use it in GitHub Desktop.
Save tanaikech/d3057587c22228503d313cea74a33e56 to your computer and use it in GitHub Desktop.
Growing Image Generation with Gemini API: Python and Node.js Now Supported

Growing Image Generation with Gemini API: Python and Node.js Now Supported

Abstract

This article announces that the Gemini API's Python client library now supports "growing image" generation, a feature previously unavailable. Sample scripts for Python and Node.js are provided to demonstrate this new capability.

Introduction

I recently published an article, "Generate Growing Images using Gemini API," which detailed a method for progressively generating images. At the time of publication, the official Python client library for the Gemini API lacked the necessary functionality to fully implement this feature, preventing Python users from easily replicating the "growing image" effect.

However, a significant and recent update to the Gemini API's Python client library has addressed this limitation. This enhancement now allows for the programmatic generation of these evolving images directly within Python environments. Furthermore, this capability is also available for Node.js developers.

Given these crucial updates, I am pleased to share sample scripts for both Python and Node.js. These scripts demonstrate how to leverage the newly added functionalities to generate "growing images" using the Gemini API, providing developers with practical examples for their own projects.

Flow

The prompts are given to Gemini in sequential order within the chat. This means the response to the first prompt serves as feedback for the second, and the second prompt's answer incorporates the first. This iterative process enables the generation of evolving images.

Sample Scripts

Before testing the following scripts, please obtain your API key for using the Gemini API.

Python

from google import genai
from google.genai import types

api_key = "###" # Please set your API key.
questions = [
    "Create an image of a clean whiteboard.",
    "Add a colored illustration drawn with a whiteboard marker of an apple to the upper left on that whiteboard. Don't stick it out from the whiteboard.",
    "Add a colored illustration drawn with a whiteboard marker of an orange to the upper right on that whiteboard. Don't stick it out from the whiteboard.",
    "Add a colored illustration drawn with a whiteboard marker of a banana to the bottom left on that whiteboard. Don't stick it out from the whiteboard.",
    "Add a colored illustration drawn with a whiteboard marker of a kiwi to the bottom right on that whiteboard. Don't stick it out from the whiteboard."
]
client = genai.Client(api_key=api_key)
chat = client.chats.create(
    model="models/gemini-2.0-flash-exp",
    config=types.GenerateContentConfig(response_modalities=["TEXT", "IMAGE"]),
)

for i, q in enumerate(questions):
    response = chat.send_message(q)
    for e in response.candidates[0].content.parts:
        if e.text:
            print("Output as text.")
            print(e.text.strip())
        elif e.inline_data:
            print("Output as an image.")
            with open(f"./{i + 1}_{q}.png", "wb") as f:
                f.write(e.inline_data.data)
print("Done.")

Node.js

import fs from "fs";
import { GoogleGenAI } from "@google/genai";

const apiKey = "###"; // Please set your API key.
const questions = [
  "Create an image of a clean whiteboard.",
  "Add a colored illustration drawn with a whiteboard marker of an apple to the upper left on that whiteboard. Don't stick it out from the whiteboard.",
  "Add a colored illustration drawn with a whiteboard marker of an orange to the upper right on that whiteboard. Don't stick it out from the whiteboard.",
  "Add a colored illustration drawn with a whiteboard marker of a banana to the bottom left on that whiteboard. Don't stick it out from the whiteboard.",
  "Add a colored illustration drawn with a whiteboard marker of a kiwi to the bottom right on that whiteboard. Don't stick it out from the whiteboard.",
];

async function main() {
  const ai = new GoogleGenAI({ apiKey });
  const chat = ai.chats.create({
    model: "gemini-2.0-flash-exp",
    config: {
      responseModalities: ["TEXT", "IMAGE"],
    },
  });

  for (let i = 0; i < questions.length; i++) {
    const message = questions[i];
    const res = await chat.sendMessage({ message });
    const parts = res.candidates[0].content.parts;
    parts.forEach((o) => {
      if (o.text) {
        console.log("Output as text.");
        console.log(o.text.trim());
      } else if (o.inlineData) {
        console.log("Output as an image.");
        fs.promises.writeFile(
          `${i + 1}_node_${message}.png`,
          o.inlineData.data,
          {
            encoding: "base64",
          }
        );
      }
    });
  }
  console.log("Done.");
}

main();

Testing

Both scripts above return similar results, although the generated texts and images may differ slightly.

With Chat

This image demonstrates that each generated image is correlated and evolves based on the preceding prompts. This is because the Gemini API's chat functionality uses the conversation history.

Without Chat

Conversely, when images are generated using the same prompts but without leveraging the chat functionality, the resulting images are not correlated, as shown above. Each image is a standalone generation without context from previous prompts.

Summary

This report concludes that using the chat functionality is highly beneficial for generating images that evolve with the conversation. Furthermore, this capability is now fully supported by the official Python and Node.js client libraries for the Gemini API.

Additional Information

When conversational history becomes extensive, it may exceed the maximum token limit of the model. To address this, a truncation strategy can be implemented, where only the most recent and relevant portions of the history are retained. By doing so, the system can still generate coherent and contextually appropriate content, integrating both the latest chat message and the image associated with the most recent interactions. This approach allows for the continuous and potentially infinite growth of an image-based conversation through chat, ensuring that the model always has the necessary context without being overwhelmed by past data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment