In this article I’ll showcase an exploration of demographic and economic data using R programming.
Some of the key points to note first,
- Utilizing the dplyr package, we filter the dataset based on multiple conditions to extract specific subsets of data.
- Example code which demos filtering based on age, gender, and city simultaneously to extract relevant subsets for analysis.
- Users can adapt the code to explore various combinations of conditions tailored to their analytical needs.
- By filtering the data, users can derive insights into specific demographic and economic trends within selected parameters.
All R code provided ensures transparency and reproducibility, enabling users to understand and adapt the analysis to their datasets in an easier way.
I’ll give you an example filters a dataset based on multiple conditions using the dplyr package.
# Load the dplyr package library(dplyr) # Sample dataset data <- data.frame( ID = c(1, 2, 3, 4, 5), Name = c("John", "Sona", "Alice", "Bob", "Venky"), Age = c(25, 30, 35, 40, 45), Gender = c("Male", "Female", "Female", "Male", "Male") ) # Filter the dataset based on multiple conditions filtered_data <- data %>% filter(Age > 30, Gender == "Male") # View the filtered dataset print(filtered_data)
In this example, I have used the filter() function from the dplyr package to filter the dataset data.
The filter() function takes multiple conditions separated by commas.
In this case, it filters the dataset to include only rows where the age is greater than 30 and the gender is male. At last, the filtered dataset is printed to the console.
Let’s see another example with larger data filtering based on multiple conditions using the dplyr package in R.
# Load the dplyr package library(dplyr) # Sample dataset data <- data.frame( ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Name = c("John", "Sona", "Mona", "Bob", "Charlie", "Venky", "Emily", "Frank", "Grace", "Chottu"), Age = c(25, 30, 35, 40, 45, 50, 55, 60, 65, 70), Gender = c("Male", "Female", "Female", "Male", "Male", "Male", "Female", "Male", "Female", "Male"), City = c("New Delhi", "Los Angeles", "Bengaluru", "Houston", "Phoenix", "Chennai", "San Antonio", "Mumbai", "Dallas", "San Jose"), Salary = c(50000, 60000, 70000, 80000, 90000, 100000, 110000, 120000, 130000, 140000) ) # Filter the dataset based on multiple conditions filtered_data <- data %>% filter(Age > 40, Gender == "Male", City %in% c("New Delhi", "Los Angeles", "Bengaluru")) # View the filtered dataset print(filtered_data)
In this example, the data dataframe contains information about individuals including their ID, Name, Age, Gender, City, and Salary.
Again I used the filter() function to select rows where the Age is greater than 40, Gender is Male, and City is either “New Delhi”, “Los Angeles”, or “Bengaluru”.
The filtered dataset is printed to the console.
To conclude, the dplyr package is the key for data manipulation, you can efficiently extract insights based on multiple conditions in R programming.