HappiomHappiom
  • Self-Improvement
  • Relationship
  • AI for Life
  • Apps
  • Tech
  • More
    • Online Diary
    • Glossary
  • Learn
    • Book
    • >> Soft Skills
    • Time Management
    • >> Tech Skills
    • R
    • Linux
    • Python
  • Our Apps
    • Download Diary App
    • Write Your First Diary
    • Login to Online Diary App
    • 100K+ Famous Quotes Site
  • Resources
    • Self-Improvement Guide
      • 21-Days to Self-Improvement
      • Creating a Habit
      • Learn Life Experiences
      • Easily Prioritizing Tasks
      • Learning from Mistakes
      • Doing Regular Exercises
      • Setting Priority for Success
      • Avoiding Common Mistakes
      • Eating Healthy Food Regularly
    • Journaling Guide
      • Online Diary
      • Best Diary Apps
      • Diary Writing Ideas
      • Diary Writing Topics
      • Avoid Writing in Diary
      • Diary Writing as Hobby
      • Reasons to Write a Diary
      • Types of Feelings In Diary
      • Improve Diary Writing Skills
  • Self-Improvement
  • Relationship
  • AI for Life
  • Apps
  • Tech
  • More
    • Online Diary
    • Glossary
  • Learn
    • Book
    • >> Soft Skills
    • Time Management
    • >> Tech Skills
    • R
    • Linux
    • Python
  • Our Apps
    • Download Diary App
    • Write Your First Diary
    • Login to Online Diary App
    • 100K+ Famous Quotes Site
  • Resources
    • Self-Improvement Guide
      • 21-Days to Self-Improvement
      • Creating a Habit
      • Learn Life Experiences
      • Easily Prioritizing Tasks
      • Learning from Mistakes
      • Doing Regular Exercises
      • Setting Priority for Success
      • Avoiding Common Mistakes
      • Eating Healthy Food Regularly
    • Journaling Guide
      • Online Diary
      • Best Diary Apps
      • Diary Writing Ideas
      • Diary Writing Topics
      • Avoid Writing in Diary
      • Diary Writing as Hobby
      • Reasons to Write a Diary
      • Types of Feelings In Diary
      • Improve Diary Writing Skills
Expand All Collapse All
  • R Tutorial for Beginners
    • Statistical functions and distributions in R
    • Graphics Plotting functions in R
    • Graphics devices and parameters in R
    • Read and Write Data Stored by Statistical Packages in R
    • Utility Functions in R
    • Datasets in R
    • Methods for S3 and S4 generic functions in R

Datasets in R

Datasets are essential for data analysis and statistical modeling in R. They come in various forms, including built-in datasets, external datasets loaded from files, and datasets created programmatically also.

Let me explain you in detail about the different types of datasets and how to work with them in R.

1. Built-in Datasets

R comes with several built-in datasets that are often used for practice and demonstration purposes.

Let me show you the common built-in datasets.

  • iris – A classic dataset containing measurements of iris flowers.
  • mtcars – A dataset with various attributes of different car models.
  • airquality – A dataset containing daily air quality measurements.

You can load a built-in dataset using its name.

I’ll write an example code now.

# Load the iris dataset
data(iris)

# Display the first few rows of the iris dataset
head(iris)

2. External Datasets

You can import datasets from external files like CSV, Excel, or databases.

R provides functions like `read.csv()`, `read.table()`, `read.csv2()`, `read.delim()`, etc., to import data into your program.

Let’s see how to use these functions.

# Import a CSV file
my_data <- read.csv("my_data.csv")

# Import an Excel file
library(readxl)
my_data <- read_excel("my_data.xlsx")

# Import a tab-delimited file
my_data <- read.delim("my_data.txt", header = TRUE, sep = "\t")

3. Creating Datasets Programmatically

Now I’ll create datasets programmatically using functions like `data.frame()` or by combining existing datasets.

# Create a dataframe
my_dataframe <- data.frame(
ID = c(1, 2, 3),
Name = c("John", "Alice", "Bob"),
Age = c(25, 30, 35))

4. Manipulating Datasets

Once you have a dataset loaded, you can perform various operations on it. The following are the various operations you can perform,

  • Subsetting – Extracting specific rows or columns.
  • Filtering – Selecting rows based on certain conditions.
  • Aggregation – Computing summary statistics.
  • Joining – Combining multiple datasets based on common keys.

Now let’s write a code which uses all the above operations.

# Subset the iris dataset
subset_iris <- iris[1:5, ]

# Filter the mtcars dataset
filter_mtcars <- mtcars[mtcars$mpg > 20, ]

# Compute summary statistics
summary(iris)

# Join datasets
merged_data <- merge(dataset1, dataset2, by = "ID")

Example

Let’s do a quick analysis on the iris dataset.

# Load the iris dataset
data(iris)

# Summary statistics
summary(iris)

# Plotting
plot(iris$Petal.Length, iris$Petal.Width, col = iris$Species, pch = 19,
xlab = "Petal Length", ylab = "Petal Width", main = "Iris Dataset")

# Boxplot
boxplot(Sepal.Length ~ Species, data = iris, main = "Sepal Length by Species")

This above code loads the iris dataset, displays summary statistics, creates a scatter plot of petal length vs. width colored by species. It also performs boxplot of sepal length by species.

Datasets are the easier fundamental for data analysis in R programming. Whether built-in, imported, or created programmatically, understanding how to work with datasets is essential for any data analysis task.

Example Dataset of Customer Transactions for a Retail Store

Let’s consider a real-world use case involving a dataset of customer transactions for a retail store. I’ll perform various operations on the dataset to analyze customer behavior and generate insights – this helps you to easily understand the dataset concepts.

# Load the dataset
transactions <- read.csv("transactions.csv")

# Display the structure of the dataset
str(transactions)

# Summary statistics
summary(transactions)

# Filter transactions for a specific product category
electronics_transactions <- subset(transactions, Category == "Electronics")

# Compute total sales for each product category
sales_by_category <- aggregate(Amount ~ Category, data = transactions, FUN = sum)

# Identify top-selling products
top_products <- head(arrange(transactions, desc(Amount)), n = 10)

# Merge with customer data to analyze demographics
customer_data <- read.csv("customer_data.csv")
merged_data <- merge(transactions, customer_data, by = "CustomerID")

# Compute average transaction amount by gender
avg_transaction_by_gender <- aggregate(Amount ~ Gender, data = merged_data, FUN = mean)

# Visualize transaction distribution
hist(transactions$Amount, main = "Transaction Amount Distribution", xlab = "Amount")

# Generate a time series plot of transaction count over time
transactions$Date <- as.Date(transactions$Date)
transaction_ts <- ts(table(transactions$Date), start = min(transactions$Date), frequency = 365)
plot(transaction_ts, main = "Transaction Count Over Time", xlab = "Date", ylab = "Transaction Count")

Let me summarize the above code so that everything will be clear for you.

  • Start by loading the transaction data from a CSV file.
  • Inspect the structure of the dataset using `str()` and generate summary statistics using `summary()`.
  • Filter transactions for a specific product category (in this case, “Electronics”) using `subset()`.
  • Compute total sales for each product category using `aggregate()`.
  • Identify top-selling products by sorting the dataset based on transaction amounts using `arrange()` from the `dplyr` package.
  • Merge transaction data with customer data based on the common column “CustomerID” using `merge()`.
  • Compute the average transaction amount by gender using `aggregate()`.
  • Visualize the distribution of transaction amounts using a histogram and plot the transaction count over time using a time series plot.

When you learn the basics of R functions and libraries to manipulate, analyze, and visualize datasets efficiently – you must be able to handle any R related work or projects.

Related Articles
  • Methods for S3 and S4 generic functions in R
  • Utility Functions in R
  • Read and Write Data Stored by Statistical Packages in R
  • Graphics devices and parameters in R
  • Graphics Plotting functions in R
  • Statistical functions and distributions in R

No luck finding what you need? Contact Us

Previously
Utility Functions in R
Up Next
Methods for S3 and S4 generic functions in R
  • About Us
  • Contact Us
  • Archive
  • Hindi
  • Tamil
  • Telugu
  • Marathi
  • Gujarati
  • Malayalam
  • Kannada
  • Privacy Policy
  • Copyright 2025 Happiom. All Rights Reserved.