HappiomHappiom
  • Self-Improvement
  • Relationship
  • AI for Life
  • Apps
  • Tech
  • More
    • Online Diary
    • Glossary
  • Learn
    • Book
    • >> Soft Skills
    • Time Management
    • >> Tech Skills
    • R
    • Linux
    • Python
  • Our Apps
    • Download Diary App
    • Write Your First Diary
    • Login to Online Diary App
    • 100K+ Famous Quotes Site
  • Resources
    • Self-Improvement Guide
      • 21-Days to Self-Improvement
      • Creating a Habit
      • Learn Life Experiences
      • Easily Prioritizing Tasks
      • Learning from Mistakes
      • Doing Regular Exercises
      • Setting Priority for Success
      • Avoiding Common Mistakes
      • Eating Healthy Food Regularly
    • Journaling Guide
      • Online Diary
      • Best Diary Apps
      • Diary Writing Ideas
      • Diary Writing Topics
      • Avoid Writing in Diary
      • Diary Writing as Hobby
      • Reasons to Write a Diary
      • Types of Feelings In Diary
      • Improve Diary Writing Skills
  • Self-Improvement
  • Relationship
  • AI for Life
  • Apps
  • Tech
  • More
    • Online Diary
    • Glossary
  • Learn
    • Book
    • >> Soft Skills
    • Time Management
    • >> Tech Skills
    • R
    • Linux
    • Python
  • Our Apps
    • Download Diary App
    • Write Your First Diary
    • Login to Online Diary App
    • 100K+ Famous Quotes Site
  • Resources
    • Self-Improvement Guide
      • 21-Days to Self-Improvement
      • Creating a Habit
      • Learn Life Experiences
      • Easily Prioritizing Tasks
      • Learning from Mistakes
      • Doing Regular Exercises
      • Setting Priority for Success
      • Avoiding Common Mistakes
      • Eating Healthy Food Regularly
    • Journaling Guide
      • Online Diary
      • Best Diary Apps
      • Diary Writing Ideas
      • Diary Writing Topics
      • Avoid Writing in Diary
      • Diary Writing as Hobby
      • Reasons to Write a Diary
      • Types of Feelings In Diary
      • Improve Diary Writing Skills
Expand All Collapse All
  • Python Examples
    • Basic Syntax
      • Python Example Code to Concat N Strings
      • Python Example Code to Concat 2 Numbers
      • Python Code to Find Perimeter of a Circle
      • Python Code to Convert CSV file to Parquet format
      • Python Code to Get Current Day of Week
      • Python Code to Convert Binary String to Decimal Number Vice versa
      • Python Code to Find Difference Between 2 Strings
      • Python Example Code to Remove Duplicates from a List
      • Python Example Code to Calculate Height of Triangle
      • Python Code to Generate Complex Random Password
    • File Handling
      • Python Code to Read Specific Line from a File
      • Python Code to Clear Contents of a File
      • Python Code to Count and List Files in a Directory
      • Python Code to Write & Read Key Value Pair in File
      • In Python File is Not Opening (How to Fix)
    • Modules and Libraries
      • Python Code to Load .SO File (and Invoke a Function)
      • Python Code for Automation using BDD
    • Object-Oriented Programming
      • Python Code to Create a Class with Attributes
      • Python Code to Define Methods in a Class
    • Python Example Code to Check Internet Connection
    • Example Python Code to Send an Email
    • Python Code to Fetch Data from an API (e.g., OpenWeatherMap)
    • Example Python Code to Extract Text from PDF
    • Python Code to Perform Web Scraping (e.g., Scraping Wikipedia)
    • Example Python Code to Plot Data Using Matplotlib
    • Python Code to Perform Data Analysis with Pandas
    • Example Python Code to Train a Simple Machine Learning Model (e.g., Linear Regression)
    • Python Code to Handle User Authentication in Flask
    • Example Python Code to interact with databases using libraries like SQLAlchemy

Python Code to Perform Web Scraping (e.g., Scraping Wikipedia)

Web scraping is the process of extracting data from websites. It allows you to collect and use information from web pages. This technique is useful for data analysis, research, and more.

Python simplifies web scraping with its powerful libraries. The `requests` library fetches web page content easily. `BeautifulSoup` then parses the HTML and extracts the data you need.

Together, these libraries make web scraping efficient and straightforward. You can quickly gather data from various sources on the internet. This approach is valuable for tasks that require data collection from online resources.

Example: Scraping Wikipedia

In this example, we’ll scrape the summary of the Wikipedia page for “Python (programming language)”. We’ll use the `requests` library to fetch the page content and `BeautifulSoup` to parse and extract the desired information.

Prerequisites

Ensure you have the required libraries installed. You can install them using pip:

pip install requests beautifulsoup4

Python Code Example

Here is a Python script that performs the web scraping:

import requests
from bs4 import BeautifulSoup

# URL of the Wikipedia page to scrape
url = 'https://en.wikipedia.org/wiki/Python_(programming_language)'

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the content of the page
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find the first

tag in the content (summary) summary = soup.find(‘p’).text # Print the summary print(summary) else: print(f”Failed to retrieve page. Status code: {response.status_code}”)

Explanation of the Code

  • import requests and from bs4 import BeautifulSoup: Import necessary libraries.
  • url: The URL of the Wikipedia page to scrape.
  • requests.get(url): Send a GET request to the URL to retrieve the page content.
  • BeautifulSoup(response.text, 'html.parser'): Parse the page content using BeautifulSoup.
  • soup.find('p').text: Extract the text from the firsttag, which typically contains the summary.
  • print(summary): Output the extracted summary.

Output Example

The output of the script will be the summary of the Wikipedia page. It will look something like this:

Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.

Web Scraping with Scrapy: Yet Another Detailed Example

In this example, we’ll use Scrapy to scrape quotes from the “Quotes to Scrape” website. This example demonstrates how to set up a Scrapy spider, configure it to scrape specific data, and handle the extracted information.

Prerequisites

Make sure you have Scrapy installed. You can install it using pip:

pip install scrapy

Creating the Scrapy Project and Spider

Follow these steps to set up a Scrapy project and create a spider:

1. Create a new Scrapy project:

scrapy startproject quotes_scraper

2. Navigate to the project directory:

cd quotes_scraper

3. Create a new spider file in the quotes_scraper/spiders directory. Name it quotes_spider.py and add the following code:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = ['http://quotes.toscrape.com/']

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('span small.author::text').get(),
                'tags': quote.css('div.tags a.tag::text').getall(),
            }

Running the Spider

To run the spider and scrape the data, use the following command:

scrapy crawl quotes -o quotes.json

Output Example

After running the spider, the data will be saved in a file named quotes.json. Here’s a sample of what the output might look like:

[
    {
        "text": "The world is full of magic things, patiently waiting for our senses to grow sharper.",
        "author": "W.B. Yeats",
        "tags": ["inspirational", "magic"]
    },
    {
        "text": "The greatest glory in living lies not in never falling, but in rising every time we fall.",
        "author": "Nelson Mandela",
        "tags": ["inspirational", "life"]
    },
    {
        "text": "Life is what happens when you're busy making other plans.",
        "author": "John Lennon",
        "tags": ["life", "humor"]
    }
]

Scrapy provides a powerful and efficient way to scrape data from websites. By setting up a spider and configuring it to extract specific information, you can automate data collection tasks. This method is particularly useful for larger and more complex scraping projects.

Web scraping is a powerful technique for data extraction and automation. With Python’s libraries, you can efficiently gather information from web pages and use it for various applications. Always ensure that your scraping activities comply with the website’s terms of service.

Related Articles
  • Python Example Code to Calculate Height of Triangle
  • Python Example Code to Remove Duplicates from a List
  • Example Python Code to interact with databases using libraries like SQLAlchemy
  • Python Code to Handle User Authentication in Flask
  • Example Python Code to Train a Simple Machine Learning Model (e.g., Linear Regression)
  • Python Code to Perform Data Analysis with Pandas

No luck finding what you need? Contact Us

Previously
Example Python Code to Extract Text from PDF
Up Next
Example Python Code to Plot Data Using Matplotlib
  • About Us
  • Contact Us
  • Archive
  • Hindi
  • Tamil
  • Telugu
  • Marathi
  • Gujarati
  • Malayalam
  • Kannada
  • Privacy Policy
  • Copyright 2025 Happiom. All Rights Reserved.