If you’ve ever wanted to extract useful data from websites — like article titles, product prices, or research summaries — you’ve touched the world of web scraping.
Python makes this process remarkably easy with two powerful libraries: Requests and BeautifulSoup.
In this tutorial, we’ll build a simple yet functional scraper that can pull article titles and links from any website that allows scraping.
🧩 Step 1: Setting Up Your Environment
Before you start, make sure you’ve installed the required libraries:
pip install requests beautifulsoup4
These two libraries are all you need. requests handles the network calls to fetch HTML pages, while BeautifulSoup parses that HTML so you can extract the exact data you want.
⚙️ Step 2: Fetching the Web Page
Our first step is to grab the raw HTML from a given URL. Here’s the function that does it:
def get_html(url):
try:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
return response.text
except requests.RequestException as e:
print(f"Error fetching {url}: {e}")
return None
Adding a User-Agent header ensures the site treats your request like a normal browser visit, not an automated bot. The timeout prevents your script from hanging forever.
🔍 Step 3: Parsing and Extracting Data
Once we have the HTML, we’ll extract the article titles and links using BeautifulSoup.
def parse_articles(html):
soup = BeautifulSoup(html, "html.parser")
articles = []
for article in soup.find_all("article"):
title_tag = article.find("h2")
if not title_tag:
continue
link_tag = title_tag.find("a")
title = link_tag.get_text(strip=True) if link_tag else title_tag.get_text(strip=True)
link = link_tag["href"] if link_tag and link_tag.has_attr("href") else None
articles.append({"title": title, "link": link})
return articles
This function looks for <article> tags and extracts each article’s title and link.
You can easily modify the tag names to match the structure of the website you’re scraping.
🖨️ Step 4: Displaying the Results
Let’s make our scraper print what it finds in a clean, readable format.
def display_results(articles):
if not articles:
print("No articles found.")
return
print(f"\nFound {len(articles)} articles:\n")
for i, article in enumerate(articles, 1):
print(f"{i}. {article['title']}")
print(f" {article['link']}\n")
When you run the program, you’ll see a neat list of titles and links, ready to use in research, analysis, or automation tasks.
🚀 Step 5: Putting It All Together
Finally, let’s tie it all up with a main function that defines the target URL and orchestrates the scraping.
def main():
url = "https://realpython.com/"
print(f"Scraping articles from: {url}")
html = get_html(url)
if not html:
print("Failed to retrieve HTML content.")
return
articles = parse_articles(html)
display_results(articles)
if __name__ == "__main__":
main()
Run this program and you’ll see real article titles printed right in your terminal.
🧠 Final Thoughts
Web scraping is one of the most practical and exciting skills in Python.
Once you’ve mastered the basics, you can take it further — save data into CSV files, schedule scrapes automatically, or feed that data into AI models for training and analysis.
At ProjectPy.com, we explore exactly this kind of synergy between Python and AI — building small, powerful tools that automate the web, extract insights, and make your projects smarter.
If you enjoyed this tutorial, consider subscribing to our newsletter for weekly Python + AI coding projects!

