Extracting Business Leads into Excel with APIs

In today’s digital landscape, businesses are constantly looking for ways to acquire leads effectively. One of the most powerful tools at a developer’s disposal is the ability to harness APIs for data extraction. In this tutorial, we will explore a practical implementation of extracting business leads from search results and organizing them into an Excel spreadsheet using Python. We will utilize the Google Custom Search API to fetch relevant search results and OpenAI’s API to extract structured information from unstructured text. Let’s dive in!

Introduction: Use Case

Imagine you’re a marketing professional tasked with gathering contact information for local businesses in your area. Manually searching for businesses on platforms like Facebook, LinkedIn, and Instagram can be tedious and time-consuming. However, by automating this process through APIs, you can quickly extract valuable business leads and organize them for outreach. This tutorial will guide you through a complete implementation using Python, where we will query Google for business information and then refine the results using OpenAI to extract key contact details.

Google Search Function

This function demonstrates how to make a GET request to the Google Custom Search API, allowing users to retrieve search results based on a query.

📚 Recommended Python Learning Resources

Level up your Python skills with these hand-picked resources:

Academic Calculators Bundle: GPA, Scientific, Fraction & More

Academic Calculators Bundle: GPA, Scientific, Fraction & More

Click for details
View Details →

ACT Test (American College Testing) Prep Flashcards Bundle: Vocabulary, Math, Grammar, and Science

ACT Test (American College Testing) Prep Flashcards Bundle: Vocabulary, Math, Grammar, and Science

Click for details
View Details →

Leonardo.Ai API Mastery: Python Automation Guide (PDF + Code + HTML

Leonardo.Ai API Mastery: Python Automation Guide (PDF + Code + HTML

Click for details
View Details →

100 Python Projects eBook: Learn Coding (PDF Download)

100 Python Projects eBook: Learn Coding (PDF Download)

Click for details
View Details →

HSPT Vocabulary Flashcards: 1300+ Printable Study Cards + ANKI (PDF)

HSPT Vocabulary Flashcards: 1300+ Printable Study Cards + ANKI (PDF)

Click for details
View Details →
def google_search(query, start, api_key, cse_id):
    url = "https://www.googleapis.com/customsearch/v1"
    params = {
        "key": api_key,
        "cx": cse_id,
        "q": query,
        "start": start,
        "num": 10,
        "gl": "in",         # Country of search audience: India
        "cr": "countryIN"   # Restrict to country-specific domains
    }
    response = requests.get(url, params=params)
    return response.json()

Prerequisites and Setup

Before we start coding, ensure you have the following:

Extracting Information with OpenAI

This snippet shows how to use OpenAI’s API to extract structured information from unstructured text, which is crucial for processing search results effectively.

def extract_info_from_openai(text):
    prompt = f"""Extract the following details from the text:
- Website or Business Name
- Email
- Phone
- Summary

Text:
{text}

Return in this format:
Website: ...
Email: ...
Phone: ...
Summary: ...
"""
    try:
        response = openai.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error: {e}"
  • Python 3.x installed on your machine.
  • Access to the Google Custom Search API and an API key.
  • An OpenAI API key, which you can obtain by signing up on their platform.
  • Basic knowledge of Python programming, especially functions and API interaction.
  • The required libraries installed: requests, openai, and openpyxl. You can install them using pip:
pip install requests openai openpyxl

Core Concepts Explanation

Throughout this tutorial, we will be working with several core concepts:

Parsing OpenAI Response

This function parses the structured response from OpenAI, extracting relevant details like the website, email, phone, and summary, which is essential for organizing the data.

def parse_openai_response(text):
    lines = text.splitlines()
    data = {"Website": "N/A", "Email": "N/A", "Phone": "N/A", "Summary": "N/A"}
    for line in lines:
        for key in data.keys():
            if line.lower().startswith(f"{key.lower()}:"):
                data[key] = line.split(":", 1)[-1].strip()
    return data
  • APIs (Application Programming Interfaces): APIs are interfaces that allow different software applications to communicate with each other. We will use the Google Custom Search API to fetch search results and OpenAI’s API to process those results.
  • JSON (JavaScript Object Notation): The data we receive from the Google Custom Search API will be in JSON format, which is easy to manipulate in Python.
  • Excel File Manipulation: We will use the openpyxl library to create and manipulate Excel files, allowing us to store our extracted leads in a structured manner.

Step-by-Step Implementation Walkthrough

Step 1: Setting Up API Credentials

Your first step is to set up the necessary API credentials. Replace the placeholder in the provided code with your actual Google API key, and ensure that your OpenAI API key is set as an environment variable. This setup is crucial as it allows your script to authenticate with the respective services.

Main Driver Function

This main function orchestrates the entire process, from user input to saving search results in an Excel file, demonstrating how to integrate various components of the code into a cohesive workflow.

def main():
    search_term = input("Enter your search term (e.g., Dentist): ").strip()
    total_records = int(input("How many records do you want (e.g., 30, 50, 100): ").strip())

    wb = openpyxl.Workbook()
    ws = wb.active
    ws.title = "Search Results"
    ws.append(["Website Source", "Query", "Title", "Website", "Email", "Phone", "Summary"])

    for site in SITES:
        site_name = site.replace("site:", "")
        full_query = f"{search_term} {site}"
        print(f"\n🔍 Searching: {full_query}")

        for start in range(1, total_records + 1, 10):
            results = google_search(full_query, start, GOOGLE_API_KEY, SEARCH_ENGINE_ID)
            items = results.get("items", [])

            if not items:
                print(f"⚠️ No more results after {start - 1}")
                break

            for item in items:
                title = item.get("title", "")
                snippet = item.get("snippet", "")
                combined = f"{title}\n{snippet}"
                response_text = extract_info_from_openai(combined)
                parsed = parse_openai_response(response_text)
                ws.append([
                    site_name,
                    search_term,
                    title,
                    parsed["Website"],
                    parsed["Email"],
                    parsed["Phone"],
                    parsed["Summary"]
                ])
                print(f"✅ Added: {title}")
                time.sleep(1)

    filename = f"search_results_{search_term}.xlsx"
    wb.save(filename)
    print(f"\n📁 Saved to: {filename}")

Step 2: Querying Google for Business Leads

In our implementation, we will define a function to query Google using the Custom Search API. This function will take a search term, a starting point for pagination, and your API credentials as arguments. Understanding how to construct the request with appropriate parameters is essential for retrieving relevant results. Specifically, we will focus on limiting our results to specific sites, which helps us target our search effectively.

Step 3: Extracting Information with OpenAI

Once we have our search results, the next step is to process the results using OpenAI’s API to extract structured information. We will create a prompt that instructs the model to identify key details such as the business name, email, phone number, and a summary of the findings. This step is crucial because raw data from search results is often unstructured, making it difficult to use without proper extraction.

Step 4: Parsing the OpenAI Response

After receiving the structured response from OpenAI, we will implement logic to parse this response and store the information in a dictionary format. This organization makes it easier to access and manipulate the data later on.

Step 5: Saving to Excel

Finally, we will create an Excel workbook and save our extracted leads into a neatly organized spreadsheet. This step not only allows for easy access to your leads but also facilitates sharing and collaboration with your team.

Advanced Features or Optimizations

Once you’ve implemented the basic functionality, consider enhancing your project with the following advanced features:

Setting Up Excel Workbook

This snippet illustrates how to create and set up an Excel workbook using the `openpyxl` library, which is essential for organizing and saving the search results in a structured format.

wb = openpyxl.Workbook()
ws = wb.active
ws.title = "Search Results"
ws.append(["Website Source", "Query", "Title", "Website", "Email", "Phone", "Summary"])
  • Error Handling: Implement robust error handling to manage API rate limits and potential connectivity issues. This can prevent your script from crashing and ensure smooth execution.
  • Multi-threading: Utilize multi-threading to speed up the process of querying multiple results simultaneously, especially if you’re extracting a large number of leads.
  • Data Validation: Add data validation checks to ensure that the extracted information is accurate and complete before saving it to Excel.

Practical Applications

This project has numerous practical applications:

  • Marketing teams can use it to gather leads for targeted outreach campaigns.
  • Sales teams can automate the process of prospecting businesses in specific niches.
  • Researchers can gather information on market trends by analyzing the data collected.

Common Pitfalls and Solutions

As you embark on this project, be aware of common pitfalls:

  • API Rate Limiting: Google and OpenAI impose rate limits on their APIs. Make sure to handle these limits gracefully in your code to avoid disruptions.
  • Incomplete Data Extraction: Ensure that your prompts for OpenAI are clear and concise. Ambiguous instructions may lead to incomplete or incorrect data extraction.
  • Excel Formatting Issues: Pay attention to how data is written to Excel to avoid formatting issues that could make the spreadsheet difficult to read.

Conclusion: Next Steps

In this tutorial, we have explored a comprehensive approach to extracting business leads using Python, Google Custom Search API, and OpenAI. By following the outlined steps, you can automate a significant portion of your lead generation process. As a next step, consider experimenting with different search queries, enhancing the script with additional features, or integrating it with other tools and platforms you use.

As you continue to explore the world of APIs and data extraction, remember that the possibilities are endless. With Python and the right APIs in hand, you can unlock powerful automation capabilities that will enhance your productivity and effectiveness in any project.

Happy coding!


About This Tutorial: This code tutorial is designed to help you learn Python programming through practical examples. Always test code in a development environment first and adapt it to your specific needs.

Want to accelerate your Python learning? Check out our premium Python resources including Flashcards, Cheat Sheets, Interivew preparation guides, Certification guides, and a range of tutorials on various technical areas.

Scroll to Top
WhatsApp Chat on WhatsApp