Building a Python Token Counting Tool: A Guide to Optimizing API Usage

In today’s data-driven world, APIs have become essential for integrating services and accessing external resources. However, every API call has associated costs, particularly when using models that rely on tokens for processing requests. This blog post will guide you through building a Python token counting tool using the Gemini API, focusing on optimizing your API usage to control costs effectively.

Introduction

Imagine you’re developing an application that leverages an AI model for text generation. Each request to the model consumes tokens, which can quickly add up, leading to unexpected costs. Understanding token usage is crucial not only for budget management but also for optimizing performance. This tutorial will introduce you to a comprehensive token tracking tool that counts tokens before making requests, tracks usage, estimates costs, and provides insights into your API utilization.

Token Counting Function

This function counts the number of tokens in a given text using a specified model, which is crucial for understanding how much of the API’s resources will be consumed.

📚 Recommended Python Learning Resources

Level up your Python skills with these hand-picked resources:

100 Professional HTML Email Templates | Color and Font Customizer

Click for details
View Details →

Complete Gemini API Guide – 42 Python Scripts, 70+ Page PDF & Cheat Sheet – Digital Download

Click for details
View Details →

AI Thinking Workbook

Click for details
View Details →

ACT Test (American College Testing) Prep Flashcards Bundle: Vocabulary, Math, Grammar, and Science

Click for details
View Details →

Leonardo.Ai API Mastery: Python Automation Guide (PDF + Code + HTML

Click for details
View Details →

def count_tokens(self, text, model="gemini-2.5-flash"):
    """
    Count tokens in text.
    
    Args:
        text: Text to count
        model: Model name
        
    Returns:
        int: Token count
    """
    result = self.client.models.count_tokens(
        model=model,
        contents=text
    )
    
    return result.total_tokens

Use Case

Our token counting tool is ideal for developers working with AI models, especially those concerned about managing API costs. By implementing this tool, you can ensure that your application runs efficiently without overspending on token usage. Whether you’re developing chatbots, content generation tools, or intelligent assistants, having a clear understanding of token consumption is vital.

Prerequisites and Setup

Before diving into the implementation, make sure you have the following prerequisites:

Tracking Token Usage

This method tracks the token usage for a specific API request by counting both input and output tokens, which helps in monitoring and managing API costs effectively.

def track_request(self, prompt, response, model="gemini-2.5-flash"):
    """Track a request's token usage."""
    input_tokens = self.count_tokens(prompt, model)
    output_tokens = self.count_tokens(response, model)
    
    record = {
        "timestamp": datetime.now().isoformat(),
        "model": model,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "total_tokens": input_tokens + output_tokens
    }
    
    self.history.append(record)
    return record

Python 3.7 or higher: Ensure you have a compatible Python version installed on your machine.
Gemini API Access: Sign up for the Gemini API and obtain your API key.
Required Libraries: Install the necessary libraries, particularly the Google GenAI library, which you can do using pip:

pip install google-genai

Once you have these prerequisites in place, you’re ready to start building your token counting tool!

Core Concepts Explanation

Understanding how tokens work is key to optimizing your API usage. Here are some core concepts you’ll need to grasp:

Usage Statistics Calculation

This function calculates and returns statistics about token usage, providing insights into total and average token consumption, which is essential for effective resource management.

def get_stats(self):
    """Get usage statistics."""
    if not self.history:
        return {"message": "No requests tracked yet"}
    
    total_input = sum(r["input_tokens"] for r in self.history)
    total_output = sum(r["output_tokens"] for r in self.history)
    
    return {
        "total_requests": len(self.history),
        "total_input_tokens": total_input,
        "total_output_tokens": total_output,
        "total_tokens": total_input + total_output,
        "avg_input_tokens": total_input / len(self.history),
        "avg_output_tokens": total_output / len(self.history)
    }

Tokens: Tokens are the units of measurement for processing text in AI models. Each word, character, or symbol often counts as one or more tokens, depending on the underlying tokenization algorithm.
API Requests: Each request to an AI model consumes tokens based on the input and output text. This consumption directly affects your billing.
Tracking and Analytics: Keeping a record of how many tokens are used in each request allows for better budgeting and usage analysis.

Step-by-Step Implementation Walkthrough

Let’s walk through the implementation of the token counting tool, broken down into manageable steps:

Demonstrating Token Counting

This function demonstrates how to count tokens for various text inputs, showcasing the relationship between characters and tokens, which is vital for understanding tokenization in API usage.

def demo_token_counting():
    """Demonstrate token counting."""
    print("\n" + "=" * 70)
    print("  TOKEN COUNTING AND TRACKING DEMO")
    print("=" * 70)
    
    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        print("\n Error: GEMINI_API_KEY not set")
        return
    
    client = genai.Client(api_key=api_key)
    tracker = TokenTracker(client)
    
    # Test texts
    texts = [
        "Hello",
        "Hello, world!",
        "The quick brown fox jumps over the lazy dog",
        "This is a longer sentence that will use more tokens."
    ]
    
    print("\n Token Counting Examples:")
    print("-" * 70)
    
    for text in texts:
        tokens = tracker.count_tokens(text)
        chars = len(text)
        ratio = chars / tokens if tokens > 0 else 0
        
        print(f"\nText: {text}")
        print(f"  Characters: {chars}")
        print(f"  Tokens: {tokens}")
        print(f"  Ratio: {ratio:.2f} chars/token")

1. Initializing the Token Tracker

The first step is to create a class that will handle token counting and tracking. This class will be initialized with a client that can communicate with the Gemini API. The class will contain methods for counting tokens and tracking requests, as seen in the implementation.

2. Counting Tokens

Next, implement a method to count tokens in a given text. This method will utilize the Gemini API’s token counting capabilities. By understanding how many tokens are in your input and output, you’ll gain insights into your API consumption.

3. Tracking Requests

After counting tokens, you’ll want to track each API request’s input and output token usage. This involves creating a record for every request, which will be invaluable for future analysis.

4. Calculating Usage Statistics

To make the tool even more informative, implement a method that calculates overall usage statistics. This will help you understand trends in your token consumption over time, informing better budgeting strategies.

5. Demonstrating Token Counting

Finally, create a demonstration function that showcases the token counting tool in action. This will allow you to test and validate the functionality of your implementation.

Advanced Features or Optimizations

Once the basic functionality is in place, consider implementing advanced features:

Tracking an Actual Request

This snippet tracks an actual API request by generating content based on a prompt and recording the token usage, illustrating practical application of the token tracking system.

prompt = "Write a haiku about AI"
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=prompt
)

record = tracker.track_request(prompt, response.text)

print(f"\nPrompt: {prompt}")
print(f"Response: {response.text}")
print(f"\nToken Usage:")
print(f"  Input: {record['input_tokens']} tokens")
print(f"  Output: {record['output_tokens']} tokens")
print(f"  Total: {record['total_tokens']} tokens")

Cost Estimation: Integrate a cost estimation feature based on your token usage. This can help you plan your budget more effectively.
Real-time Monitoring: Add real-time monitoring capabilities to alert you when token usage exceeds certain thresholds.
Usage Reports: Generate weekly or monthly usage reports to analyze trends and adjust your API usage accordingly.

Practical Applications

This token counting tool has various practical applications:

Budget Management: By tracking your token usage, you can manage your API budget more effectively, avoiding unexpected charges.
Performance Optimization: Use token analytics to optimize the input text sent to the API, ensuring that you get the most value from each request.
Data-Driven Decision Making: Leverage usage statistics to make informed decisions about scaling your application or adjusting your API usage strategy.

Common Pitfalls and Solutions

While implementing this token counting tool, you may encounter some common pitfalls:

Ignoring Token Limits: Be aware of the token limits imposed by the API and adjust your requests accordingly to avoid errors.
Overlooking Cost Implications: Always keep track of how many tokens you are using and their associated costs to prevent budget overruns.
Neglecting Documentation: Familiarize yourself with the API documentation to understand the nuances of token counting and request limitations.

Conclusion

In this blog post, we’ve explored building a Python token counting tool that helps track and optimize your API usage. By implementing this tool, you gain insights into your token consumption, allowing for better management of costs associated with API requests. As you build and refine your tool, consider extending its functionality with advanced features for even greater utility.

Next steps could include exploring more sophisticated token optimization strategies, integrating machine learning for predictive analytics, or building a user interface to visualize your token usage data. Happy coding!

About This Tutorial: This code tutorial is designed to help you learn Python programming through practical examples. Always test code in a development environment first and adapt it to your specific needs.

Want to accelerate your Python learning? Check out our premium Python resources including Flashcards, Cheat Sheets, Interivew preparation guides, Certification guides, and a range of tutorials on various technical areas.

Introduction

Token Counting Function

📚 Recommended Python Learning Resources

100 Professional HTML Email Templates | Color and Font Customizer

Complete Gemini API Guide – 42 Python Scripts, 70+ Page PDF & Cheat Sheet – Digital Download

AI Thinking Workbook

ACT Test (American College Testing) Prep Flashcards Bundle: Vocabulary, Math, Grammar, and Science

Leonardo.Ai API Mastery: Python Automation Guide (PDF + Code + HTML

Use Case

Prerequisites and Setup

Tracking Token Usage

Core Concepts Explanation

Usage Statistics Calculation

Step-by-Step Implementation Walkthrough

Demonstrating Token Counting

1. Initializing the Token Tracker

2. Counting Tokens

3. Tracking Requests

4. Calculating Usage Statistics

5. Demonstrating Token Counting

Advanced Features or Optimizations

Tracking an Actual Request

Practical Applications

Common Pitfalls and Solutions

Conclusion

Related Posts