3  Linear models

# Setup

## Load packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(easystats)
# Attaching packages: easystats 0.7.2
✔ bayestestR  0.13.2   ✔ correlation 0.8.5 
✔ datawizard  0.11.0   ✔ effectsize  0.8.9 
✔ insight     0.20.1   ✔ modelbased  0.8.8 
✔ performance 0.12.0   ✔ parameters  0.22.0
✔ report      0.5.8    ✔ see         0.8.4 
library(ggeffects)

Attaching package: 'ggeffects'

The following object is masked from 'package:easystats':

    install_latest
## Read data from file
polysim <- read_csv("polysim.csv")
Rows: 100 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): x, y1, y2, y3

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
polysim
# A tibble: 100 × 4
       x     y1      y2     y3
   <dbl>  <dbl>   <dbl>  <dbl>
 1  24.7  0.464   0.660  -3.24
 2  23.4 -2.87    2.27    7.32
 3  21.6 -4.27  -11.9     4.69
 4  24.0  0.137   0.263  11.7 
 5  20.3 -3.85  -17.7   -51.1 
 6  21.2 -2.07   -9.68   -2.43
 7  24.3 -2.75   -0.193   6.20
 8  26.2  1.21    0.923 -13.1 
 9  22.6 -2.34   -3.93    7.80
10  24.8  0.612   3.07   -1.46
# ℹ 90 more rows
## Plot x-y1 relationship
ggplot(data = polysim, mapping = aes(x = x, y = y1)) + geom_point()

# ==============================================================================

# LM for a linear relationship example

## Fit linear model
fit1 <- lm(
  formula = y1 ~ x,
  data = polysim
)
model_parameters(fit1) |> print_md()
Parameter Coefficient SE 95% CI t(98) p
(Intercept) -27.89 0.92 (-29.72, -26.06) -30.25 < .001
x 1.13 0.04 (1.06, 1.20) 31.13 < .001
predict_response(fit1, terms = "x") |> plot(show_data = TRUE)
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

## Check linearity assumption
check_model(fit1, check = "linearity") # good enough

# ==============================================================================

# LM with raw polynomials example

## Fit raw polynomial model
fit1b <- lm(
  formula = y1 ~ x + I(x^2),
  data = polysim
)
model_parameters(fit1b) |> print_md() # both are nonsignificant
Parameter Coefficient SE 95% CI t(97) p
(Intercept) -25.97 9.37 (-44.57, -7.38) -2.77 0.007
x 0.97 0.75 (-0.51, 2.46) 1.30 0.197
x^2 3.05e-03 0.01 (-0.03, 0.03) 0.21 0.838
predict_response(fit1b, terms = "x") |> plot(show_data = TRUE) # unchanged
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

## Check linearity and collinearity assumptions
check_model(fit1b, check = "linearity") # still acceptable

check_collinearity(fit1b) # problematically high
# Check for Multicollinearity

High Correlation

   Term    VIF       VIF 95% CI Increased SE Tolerance Tolerance 95% CI
      x 423.33 [292.35, 613.19]        20.57  2.36e-03     [0.00, 0.00]
 I(x^2) 423.33 [292.35, 613.19]        20.57  2.36e-03     [0.00, 0.00]
# ==============================================================================

# LM with orthogonal polynomials example

## Fit orthogonal polynomial model
fit1c <- lm(
  formula = y1 ~ poly(x, degree = 2),
  data = polysim
)
model_parameters(fit1c) |> print_md() # first degree is significant
Parameter Coefficient SE 95% CI t(97) p
(Intercept) 0.65 0.10 (0.45, 0.85) 6.54 < .001
x (1st degree) 30.78 0.99 (28.80, 32.75) 30.98 < .001
x (2nd degree) 0.20 0.99 (-1.77, 2.18) 0.21 0.838
predict_response(fit1c, terms = "x") |> plot(show_data = TRUE) # unchanged
Data points may overlap. Use the `jitter` argument to add some amount of
  random variation to the location of data points and avoid overplotting.

## Check linearity assumption
check_model(fit1c, check = "linearity") # still acceptable