Creating Data Frames
Data frames are a core data structure in R. They organize data in a table format with rows and columns. Each column can contain different types of data, such as numbers or characters.
Creating a data frame is simple using the `data.frame()` function. This function lets you combine vectors into a structured table. Each vector becomes a column in the data frame.
Data frames are highly versatile. They support a range of operations, like subsetting, adding, and removing columns. These features make data frames essential for data manipulation and analysis in R.
# Create a data frame df <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 35), Salary = c(50000, 55000, 60000) ) df
Output:
Name Age Salary 1 Alice 25 50000 2 Bob 30 55000 3 Charlie 35 60000
Accessing Data Frame Elements
Access specific rows, columns, or elements in a data frame using square brackets. Use a comma to separate rows and columns.
# Access the first column df$Name # Access the second row df[2, ] # Access the element in the second row, third column df[2, 3]
Output for the first column:
[1] "Alice" "Bob" "Charlie"
Output for the second row:
Name Age Salary 2 Bob 30 55000
Output for the specific element:
[1] 55000
Adding and Removing Columns
To add a column, simply assign values to a new column name. To remove a column, use the subset()
function or NULL
assignment.
# Add a new column df$Department <- c("HR", "Finance", "IT") df # Remove a column df$Salary <- NULL df
Output after adding a column:
Name Age Salary Department 1 Alice 25 50000 HR 2 Bob 30 55000 Finance 3 Charlie 35 60000 IT
Output after removing a column:
Name Age Department 1 Alice 25 HR 2 Bob 30 Finance 3 Charlie 35 IT
Manipulating Data Frames
Use functions like subset()
, merge()
, and order()
to manipulate data frames. These functions help filter, combine, and sort data.
# Filter rows where Age is greater than 28 subset(df, Age > 28) # Order by Age df_sorted <- df[order(df$Age), ] df_sorted
Output after filtering:
Name Age Department 2 Bob 30 Finance 3 Charlie 35 IT
Output after sorting:
Name Age Department 1 Alice 25 HR 2 Bob 30 Finance 3 Charlie 35 IT
Uses of Data Frames in R
- Data Organization: Data frames help organize data in a tabular format. Each column can have a different data type, making it easy to manage complex datasets.
- Data Manipulation: They allow for easy data manipulation. You can add, remove, or modify columns and rows with simple commands.
- Data Analysis: Data frames are used in various analytical operations. Functions like
summary()
andaggregate()
help summarize and analyze the data. - Data Cleaning: They facilitate data cleaning tasks. You can filter rows, handle missing values, and correct data errors efficiently.
- Data Subsetting: Data frames enable subsetting of data. You can select specific rows or columns based on conditions using functions like
subset()
. - Data Visualization: They integrate well with visualization packages. Functions in packages like
ggplot2
use data frames to create plots and charts. - Data Import and Export: Data frames support importing and exporting data. You can read from and write to various file formats such as CSV and Excel.