`tidyr` is a package in R designed for tidying and transforming data. It helps convert data into a format that is easier to analyze. The package is built on the principle that tidy data is essential for efficient data manipulation.
With `tidyr`, you can reshape data, convert between wide and long formats, and separate or unite columns.
It simplifies tasks such as spreading key-value pairs across columns or gathering multiple columns into a single key-value pair. These transformations make data preparation more straightforward.
Using `tidyr` effectively can streamline your data analysis workflow. By learning functions like `pivot_longer()`, `pivot_wider()`, and `separate()`, you can manage your datasets more efficiently.
This preparation is crucial for performing accurate and meaningful analyses.
Example Code
library(tidyr) library(dplyr) # Sample data data <- tibble( id = 1:3, name = c("John", "Jane", "Doe"), math = c(90, 80, 70), english = c(85, 90, 75) ) # 1. Pivot longer data_long <- data %>% pivot_longer(cols = c(math, english), names_to = "subject", values_to = "score") print(data_long) # Output: # # A tibble: 6 × 3 # id name subject score # # 1 1 John math 90 # 2 1 John english 85 # 3 2 Jane math 80 # 4 2 Jane english 90 # 5 3 Doe math 70 # 6 3 Doe english 75 # 2. Pivot wider data_wide <- data_long %>% pivot_wider(names_from = subject, values_from = score) print(data_wide) # Output: # # A tibble: 3 × 4 # id name math english # # 1 1 John 90 85 # 2 2 Jane 80 90 # 3 3 Doe 70 75 # 3. Separate columns data_separated <- data_long %>% separate(name, into = c("first_name", "last_name"), sep = " ") print(data_separated) # Output: # # A tibble: 6 × 4 # id first_name last_name subject score # # 1 1 John NA math 90 # 2 1 John NA english 85 # 3 2 Jane NA math 80 # 4 2 Jane NA english 90 # 5 3 Doe NA math 70 # 6 3 Doe NA english 75
Detailed Explanation
- Pivot Longer: The
pivot_longer()
function transforms data from a wide format to a long format. In the example, it takes themath
andenglish
columns and combines them into a singlesubject
column, with correspondingscore
values. - Pivot Wider: The
pivot_wider()
function converts data from a long format back to a wide format. Here, it separates thesubject
column into individual columns, such asmath
andenglish
, with their respectivescore
values. - Separate Columns: The
separate()
function splits a single column into multiple columns based on a delimiter. In this case, it splits thename
column intofirst_name
andlast_name
columns.
Conclusion
Using `tidyr` effectively can greatly enhance your data manipulation skills in R.
Here are the 5 key points for beginners:
- Wide to Long: Use
pivot_longer()
to convert wide data into a long format. - Long to Wide: Use
pivot_wider()
to convert long data back into a wide format. - Separate Columns: Use
separate()
to split a single column into multiple columns. - Combine Columns: Use
unite()
to merge multiple columns into one. - Data Cleaning: Use these tools to tidy data, making it ready for analysis.