Creating a Python Guide for Analyzing Multiple Images with the Gemini API

Introduction

As the digital landscape evolves, so does the need for advanced image analysis tools. Whether you’re developing a photo editing application, an e-commerce platform, or even an educational resource, the ability to understand and analyze multiple images simultaneously can significantly enhance user experience and functionality. In this guide, we will explore how to utilize the Gemini API to analyze multiple images in Python, allowing you to compare images, discover differences and similarities, and even process batches of images efficiently.

Creating Comparison Images

This snippet demonstrates how to create two simple images with different shapes and colors using the PIL library, which is essential for generating visual data for analysis.

from PIL import Image, ImageDraw

# Image 1
img1 = Image.new('RGB', (300, 300), color='lightgray')
draw1 = ImageDraw.Draw(img1)
draw1.rectangle([50, 50, 150, 150], fill='blue', outline='black', width=3)
draw1.ellipse([150, 150, 250, 250], fill='red', outline='black', width=3)
img1.save('/home/claude/image1.png')

# Image 2 (slightly different)
img2 = Image.new('RGB', (300, 300), color='lightgray')
draw2 = ImageDraw.Draw(img2)
draw2.rectangle([50, 50, 150, 150], fill='green', outline='black', width=3)  # Different color
draw2.ellipse([150, 150, 250, 250], fill='red', outline='black', width=3)
img2.save('/home/claude/image2.png')

Prerequisites and Setup

Before diving into the implementation, ensure you have the following prerequisites:

πŸ“š Recommended Python Learning Resources

Level up your Python skills with these hand-picked resources:

Vibe Coding Blueprint | No-Code Low-Code Guide

Vibe Coding Blueprint | No-Code Low-Code Guide

Click for details
View Details β†’

Complete Gemini API Guide – 42 Python Scripts, 70+ Page PDF & Cheat Sheet – Digital Download

Complete Gemini API Guide – 42 Python Scripts, 70+ Page PDF & Cheat Sheet – Digital Download

Click for details
View Details β†’

AI Thinking Workbook

AI Thinking Workbook

Click for details
View Details β†’

ACT Test (American College Testing) Prep Flashcards Bundle: Vocabulary, Math, Grammar, and Science

ACT Test (American College Testing) Prep Flashcards Bundle: Vocabulary, Math, Grammar, and Science

Click for details
View Details β†’

Leonardo.Ai API Mastery: Python Automation Guide (PDF + Code + HTML

Leonardo.Ai API Mastery: Python Automation Guide (PDF + Code + HTML

Click for details
View Details β†’

Basic Multi-Image Analysis

This function analyzes two images together by sending them to a model for description, illustrating how to use an API for multi-image analysis.

def basic_multi_image(client, img1_path, img2_path):
    with open(img1_path, 'rb') as f:
        img1_bytes = f.read()
    with open(img2_path, 'rb') as f:
        img2_bytes = f.read()
    
    img1_part = types.Part.from_bytes(data=img1_bytes, mime_type='image/png')
    img2_part = types.Part.from_bytes(data=img2_bytes, mime_type='image/png')
    
    prompt = "Describe both of these images."
    response = client.models.generate_content(
        model='gemini-2.5-flash',
        contents=[prompt, img1_part, img2_part]
    )
    print(f" Response:\n{response.text}")
  • Python 3.6 or higher: This guide is designed for Python 3, leveraging modern features and libraries.
  • Install the necessary libraries: You will need the Pillow library for image handling and google-genai for API interaction. Install these using pip:
    • pip install Pillow
    • pip install google-genai
  • Google Cloud Account: You’ll also need to set up a Google Cloud account, enable the Gemini API, and obtain the necessary authentication credentials.

Once you have the prerequisites in place, you are ready to start implementing the solution.

Core Concepts Explanation

Understanding how the Gemini API works and the key concepts behind image analysis is crucial before we jump into coding. In this section, we will break down the fundamental components:

Image Comparison

This snippet shows how to compare two images by asking specific questions about their differences and similarities, which is crucial for understanding image relationships.

def image_comparison(client, img1_path, img2_path):
    prompts = [
        "What are the differences between these two images?",
        "What do these images have in common?",
        "Which image has a blue shape?"
    ]
    
    for prompt in prompts:
        response = client.models.generate_content(
            model='gemini-2.5-flash',
            contents=[prompt, img1_part, img2_part]
        )
        print(f"\n {prompt}")
        print(f"[CHAT] {response.text}")

Image Creation and Manipulation

The first step in our analysis involves creating images for comparison. Using the Pillow library, we can draw shapes and fill them with colors to generate two images that are similar yet distinct. This step is essential for demonstrating the capabilities of the Gemini API.

Multi-Image Analysis

Once we have our images ready, the next phase is to analyze them together. The Gemini API allows us to send multiple images in a single request, which can be highly efficient for tasks like comparing images, recognizing patterns, or gaining insights into visual data.

Batch Processing

Handling multiple images can quickly become cumbersome without a structured approach. The use of batch processing enables us to analyze numerous images in one go, saving time and resources. By implementing a pattern to efficiently process batches of images using loops and file handling, we can enhance our application’s performance significantly.

Step-by-Step Implementation Walkthrough

Now that we’ve covered the core concepts, let’s walk through the implementation of the image analysis script. Our script will consist of several functions that cater to different tasks, from image creation to analysis.

Batch Processing Pattern

This snippet illustrates how to efficiently process multiple images in a batch, demonstrating the use of file handling and API calls for categorization tasks.

import glob
from pathlib import Path

image_files = glob.glob('images/*.jpg')
results = []

for image_file in image_files:
    with open(image_file, 'rb') as f:
        image_bytes = f.read()
    
    image_part = types.Part.from_bytes(
        data=image_bytes,
        mime_type='image/jpeg'
    )
    
    response = client.models.generate_content(
        model='gemini-2.5-flash',
        contents=[image_part, "Categorize this image (person/place/thing)."]
    )
    
    results.append({
        'file': Path(image_file).name,
        'category': response.text.strip()
    })

Creating Comparison Images

First, we need to create two images that we can compare. This is done using the Pillow library, where we generate shapes and colors to distinguish the images. As shown in the implementation, we create a simple rectangle and ellipse in each image, with one key difference in color to facilitate analysis.

Basic Multi-Image Analysis

Once we have our images, we’ll write a function to read the images from disk and send them to the Gemini API for analysis. This function will handle the file I/O operations and send the images as bytes, which is the format required by the API. The response from the API will provide insights into the images, including descriptions and potential relationships between them.

Image Comparison

To gain deeper insights, we can ask specific questions regarding the images, such as their differences and similarities. This functionality is implemented in a dedicated function that sends these inquiries to the Gemini API, allowing us to extract meaningful comparisons based on the visual content of the images.

Batch Processing Pattern

Finally, to analyze multiple images located in a directory, we implement a batch processing pattern. Using the glob module, we can dynamically find and process all images in a specified folder. This approach is efficient and scalable, especially when dealing with large datasets of images.

Advanced Features or Optimizations

Once you’ve grasped the basic implementation, consider exploring advanced features and optimizations:

Multi-Image Q&A

This function allows users to ask specific questions about two images, showcasing how to interact with an AI model for detailed analysis and insights.

def multi_image_qa(client, img1_path, img2_path):
    questions = [
        "How many total shapes are in both images combined?",
        "Which image has more variety in colors?",
        "If these were buttons in an app, which would you click first and why?"
    ]
    
    for question in questions:
        response = client.models.generate_content(
            model='gemini-2.5-flash',
            contents=[question, img1_part, img2_part]
        )
        print(f"\n {question}")
        print(f"[CHAT] {response.text}")
  • Error Handling: Robust error handling can improve the reliability of your application. Implement try-except blocks to catch potential issues, such as file not found errors or API request failures.
  • Asynchronous Processing: For applications requiring real-time analysis, consider implementing asynchronous calls to the Gemini API, allowing you to handle multiple requests concurrently.
  • Image Preprocessing: Before sending images for analysis, you might want to preprocess them (resize, normalize, etc.) to ensure consistent input for the API.

Practical Applications

The ability to analyze multiple images has numerous practical applications:

  • Photo Editing Software: Enhance user experiences by providing side-by-side comparisons of edited and original images.
  • E-commerce: Automatically categorize products by analyzing images and providing insights into similarities and differences.
  • Educational Tools: Create applications that help students compare and contrast visual data, enhancing their learning experience.

Common Pitfalls and Solutions

While working with image analysis, you may encounter several pitfalls. Here are some common issues and their solutions:

  • File Path Errors: Ensure that your image paths are correct and accessible. Consider using absolute paths or checking for file existence before processing.
  • API Rate Limiting: Be aware of the API limits set by Google. If you’re making numerous requests in a short period, you may hit these limits. Implement a back-off strategy to handle this gracefully.
  • Image Format Issues: Ensure that the images you are processing are in a format supported by both the Pillow library and the Gemini API.

Conclusion

In this guide, we’ve explored the powerful capabilities of the Gemini API for analyzing multiple images using Python. By understanding the core concepts and following the step-by-step implementation, you are now equipped to leverage image analysis in your projects. From creating comparison images to batch processing for large datasets, the tools and techniques discussed here can significantly enhance your applications.

As a next step, consider experimenting with the advanced features mentioned, or explore additional functionalities offered by the Gemini API to further expand your image analysis capabilities. With continuous learning and experimentation, you’ll uncover even more possibilities in the realm of image processing.


About This Tutorial: This code tutorial is designed to help you learn Python programming through practical examples. Always test code in a development environment first and adapt it to your specific needs.

Want to accelerate your Python learning? Check out our premium Python resources including Flashcards, Cheat Sheets, Interivew preparation guides, Certification guides, and a range of tutorials on various technical areas.

Scroll to Top
WhatsApp Chat on WhatsApp