In R, linear regression analysis can be effortlessly conducted using the lm() function, which fits a linear model to the given data.
This versatile method allows researchers and data analysts to explore and understand the associations between variables, make predictions, and derive valuable insights from their data.
Let’s see an example of how you can perform linear regression in R.
# Sample data x <- c(1, 2, 3, 4, 5) y <- c(2, 3, 4, 5, 6) # Perform linear regression model <- lm(y ~ x) # Summary of regression results summary(model) # Plot the data and regression line plot(x, y, main = "Linear Regression", xlab = "X", ylab = "Y") abline(model, col = "red")
Let me explain the steps.
- Create some sample data x and y.
- Fit a linear regression model using the lm() function.
- Summarize the regression results using summary().
- Plot the data points and the regression line using plot() and abline() functions.
You can replace x and y with your own dataset. The lm() function takes a formula as its first argument, where you specify the relationship between the variables you want to fit the regression model for.
For example, lm(y ~ x) specifies that you want to predict y based on x.
Let me give you another example to perform linear regression using the lm() function in R, but with a slightly different approach to the code structure.
# Sample data x <- c(1, 2, 3, 4, 5) y <- c(2, 3, 4, 5, 6) data <- data.frame(x, y) # Perform linear regression model <- lm(y ~ x, data = data) # Summary of regression results summary(model) # Plot the data and regression line plot(data$x, data$y, main = "Linear Regression", xlab = "X", ylab = "Y") abline(model, col = "red")
Let’s understand the code.
- The sample data is stored in a data frame data, which makes it easier to handle if you have more variables.
- The lm() function is used to fit the linear regression model directly on the data frame, specifying the formula and the data frame where the variables reside.
This method allows you to directly reference the variables in the data frame, which can be more convenient especially when working with larger datasets.