HappiomHappiom
  • Self-Improvement
  • Relationship
  • AI for Life
  • Apps
  • Tech
  • More
    • Online Diary
    • Glossary
  • Learn
    • Book
    • >> Soft Skills
    • Time Management
    • >> Tech Skills
    • R
    • Linux
    • Python
  • Our Apps
    • Download Diary App
    • Write Your First Diary
    • Login to Online Diary App
    • 100K+ Famous Quotes Site
  • Resources
    • Self-Improvement Guide
      • 21-Days to Self-Improvement
      • Creating a Habit
      • Learn Life Experiences
      • Easily Prioritizing Tasks
      • Learning from Mistakes
      • Doing Regular Exercises
      • Setting Priority for Success
      • Avoiding Common Mistakes
      • Eating Healthy Food Regularly
    • Journaling Guide
      • Online Diary
      • Best Diary Apps
      • Diary Writing Ideas
      • Diary Writing Topics
      • Avoid Writing in Diary
      • Diary Writing as Hobby
      • Reasons to Write a Diary
      • Types of Feelings In Diary
      • Improve Diary Writing Skills
  • Self-Improvement
  • Relationship
  • AI for Life
  • Apps
  • Tech
  • More
    • Online Diary
    • Glossary
  • Learn
    • Book
    • >> Soft Skills
    • Time Management
    • >> Tech Skills
    • R
    • Linux
    • Python
  • Our Apps
    • Download Diary App
    • Write Your First Diary
    • Login to Online Diary App
    • 100K+ Famous Quotes Site
  • Resources
    • Self-Improvement Guide
      • 21-Days to Self-Improvement
      • Creating a Habit
      • Learn Life Experiences
      • Easily Prioritizing Tasks
      • Learning from Mistakes
      • Doing Regular Exercises
      • Setting Priority for Success
      • Avoiding Common Mistakes
      • Eating Healthy Food Regularly
    • Journaling Guide
      • Online Diary
      • Best Diary Apps
      • Diary Writing Ideas
      • Diary Writing Topics
      • Avoid Writing in Diary
      • Diary Writing as Hobby
      • Reasons to Write a Diary
      • Types of Feelings In Diary
      • Improve Diary Writing Skills
Expand All Collapse All
  • R Code Examples
    • R Code to Create and Manipulate Vectors
    • R Code to Work with Data Frames
    • R Code to Handle Factors and Categorical Data
    • Example R Code for Basic Data Visualization with ggplot2
    • R Code to Aggregate Data Using dplyr
    • R Code to Apply Functions with lapply and sapply
    • R Code to Handle Missing Data
    • Example R Code for String Manipulation with stringr
    • R Code to Transform Data with tidyr
    • R Code to Perform ADF Test
    • R Code to Perform Data Import and Export with CSV
    • R Code for Filtering Data
    • R Code for Easily Summarizing Data
    • R Code to Perform Linear Regression for Statistical Analysis
    • R Code to Perform t-tests for Statistical Analysis
    • Example R Code for Time Series Analysis
    • R Code for Doing Web Scraping with Examples
    • R Code to Showcase Geospatial Analysis
    • Example R Code to Filter Multiple Conditions (for Data Manipulation)

R Code for Doing Web Scraping with Examples

Web scraping is the process of extracting data from websites.

In R, you can use the rvest package for web scraping, which provides easy-to-use functions for extracting information from HTML web pages.

Simple Example

I’ll show you a simple example first on how you can scrape data from a website using R and the rvest package.

# Install and load required packages
install.packages("rvest")
library(rvest)

# Specify the URL of the website you want to scrape
url <- "https://example.com"

# Read the HTML content of the webpage
webpage <- read_html(url)

# Extract specific information from the webpage using CSS selectors
# For example, let's say you want to scrape the titles of articles from a news website
# You can use the selectorGadget browser extension to find the CSS selectors for the elements you want to scrape

# Use the selectorGadget extension to identify the CSS selector for the titles of articles
# Suppose the CSS selector for article titles is ".article-title"
# Adjust this selector according to the structure of the webpage you are scraping

# Extract article titles
article_titles <- webpage %>%
  html_nodes(".article-title") %>%
  html_text()

# Print the extracted article titles
print(article_titles)

Let me explain the above code, how it works.

  • First, you need to install and load the rvest package, which provides functions for web scraping.
  • Set the URL of the website you want to scrape.
  • Use the read_html() function to read the HTML content of the webpage specified by the URL.
  • Use CSS selectors to specify the elements from which you want to extract information. You can use the html_nodes() function to select nodes based on CSS selectors. In the example, we use the CSS selector “.article-title” to select article titles.
  • Use the html_text() function to extract the text content of the selected HTML nodes.
  • Process the extracted information as needed. In this example, we print the extracted article titles.

Remember to adjust the CSS selectors according to the structure of the webpage you are scraping. You can use browser extensions like SelectorGadget to easily find CSS selectors for the elements you want to scrape.

Detailed Example

In this detailed example, I’ll scrape data from a hypothetical website that lists the top 10 movies of all time along with their ratings and release years.

Let’s extract this information and store it in a data frame.

# Install and load required packages
install.packages("rvest")
library(rvest)

# Specify the URL of the website you want to scrape
url <- "https://example-movies.com/top-10-movies"

# Read the HTML content of the webpage
webpage <- read_html(url)

# Extract movie titles
movie_titles <- webpage %>%
  html_nodes(".movie-title") %>%
  html_text()

# Extract ratings
ratings <- webpage %>%
  html_nodes(".rating") %>%
  html_text()

# Extract release years
release_years <- webpage %>%
  html_nodes(".release-year") %>%
  html_text()

# Create a data frame to store the extracted information
movies_data <- data.frame(
  Title = movie_titles,
  Rating = ratings,
  Release_Year = release_years
)

# Print the extracted information
print(movies_data)

I’ll explain the steps in detail.

  • Install and load the rvest package, which is necessary for web scraping.
  • Set the URL of the website from which we want to scrape data.
  • Using read_html(), we fetch the HTML content of the webpage.
  • Identify CSS selectors for movie titles, ratings, and release years. We use html_nodes() to select nodes based on these selectors and html_text() to extract the text content of these nodes.
  • Create a data frame to store the extracted information. Each column of the data frame corresponds to the information we extracted (title, rating, release year).
  • Print the data frame to see the extracted information.

You can also use browser tools like Chrome DevTools or Firefox Developer Tools to inspect the HTML structure of the webpage and find appropriate CSS selectors.

Related Articles
  • R Code to Transform Data with tidyr
  • Example R Code for String Manipulation with stringr
  • R Code to Handle Missing Data
  • R Code to Apply Functions with lapply and sapply
  • R Code to Aggregate Data Using dplyr
  • Example R Code for Basic Data Visualization with ggplot2

No luck finding what you need? Contact Us

Previously
Example R Code for Time Series Analysis
Up Next
R Code to Showcase Geospatial Analysis
  • About Us
  • Contact Us
  • Archive
  • Hindi
  • Tamil
  • Telugu
  • Marathi
  • Gujarati
  • Malayalam
  • Kannada
  • Privacy Policy
  • Copyright 2025 Happiom. All Rights Reserved.