Multiple linear regression (MLR)

Prof. Maria Tackett

Sep 27, 2023

Computational setup

# load packages
library(tidyverse)
library(tidymodels)
library(openintro)
library(patchwork)
library(knitr)
library(kableExtra)
library(colorblindr)

# set default theme and larger font size for ggplot2
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16))

min	median	max	iqr
5.31	9.93	26.3	5.755

term	estimate	std.error	statistic	p.value
(Intercept)	10.726	1.507	7.116	0.000
debt_to_income	0.671	0.676	0.993	0.326
verified_incomeSource Verified	2.211	1.399	1.581	0.121
verified_incomeVerified	6.880	1.801	3.820	0.000
annual_income_th	-0.021	0.011	-1.804	0.078

Interpreting ${\hat{β}}_{j}$

The estimated coefficient ${\hat{β}}_{j}$ is the expected change in the mean of $y$ when $x_{j}$ increases by one unit, holding the values of all other predictor variables constant.

Example: The estimated coefficient for debt_to_income is 0.671. This means for each point in an borrower’s debt to income ratio, the interest rate on the loan is expected to be greater by 0.671%, holding annual income and income verification constant.

Prediction interval for $\hat{y}$

Calculate a 90% confidence interval for the predicted interest rate for an individual appllicant with an debt-to-income ratio of 0.558, whose income is not verified, and who has an annual income of $59,000.

predict(int_fit, new_borrower, type = "pred_int", level = 0.90)

# A tibble: 1 × 2
  .pred_lower .pred_upper
        <dbl>       <dbl>
1        2.18        17.6

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	10.726	1.507	7.116	0.000	7.690	13.762
debt_to_income	0.671	0.676	0.993	0.326	-0.690	2.033
verified_incomeSource Verified	2.211	1.399	1.581	0.121	-0.606	5.028
verified_incomeVerified	6.880	1.801	3.820	0.000	3.253	10.508
annual_income_th	-0.021	0.011	-1.804	0.078	-0.043	0.002

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	9.444	0.977	9.663	0.000	7.476	11.413
debt_inc_cent	0.671	0.676	0.993	0.326	-0.690	2.033
verified_incomeSource Verified	2.211	1.399	1.581	0.121	-0.606	5.028
verified_incomeVerified	6.880	1.801	3.820	0.000	3.253	10.508
annual_income_th_cent	-0.021	0.011	-1.804	0.078	-0.043	0.002

term	estimate
(Intercept)	10.726
debt_to_income	0.671
verified_incomeSource Verified	2.211
verified_incomeVerified	6.880
annual_income_th	-0.021

term	estimate
(Intercept)	9.444
debt_inc_cent	0.671
verified_incomeSource Verified	2.211
verified_incomeVerified	6.880
annual_income_th_cent	-0.021

Indicators in the model

We will use $K - 1$ of the indicator variables in the model.
The baseline is the category that doesn’t have a term in the model.
The coefficients of the indicator variables in the model are interpreted as the expected change in the response compared to the baseline, holding all other variables constant.
This approach is also called dummy coding.

loan50 |>
  select(verified_income, source_verified, verified) |>
  slice(1, 3, 6)

# A tibble: 3 × 3
  verified_income source_verified verified
  <fct>                     <dbl>    <dbl>
1 Not Verified                  0        0
2 Verified                      0        1
3 Source Verified               1        0

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	9.444	0.977	9.663	0.000	7.476	11.413
debt_inc_cent	0.671	0.676	0.993	0.326	-0.690	2.033
verified_incomeSource Verified	2.211	1.399	1.581	0.121	-0.606	5.028
verified_incomeVerified	6.880	1.801	3.820	0.000	3.253	10.508
annual_income_th_cent	-0.021	0.011	-1.804	0.078	-0.043	0.002

term	estimate	std.error	statistic	p.value
(Intercept)	9.484	0.989	9.586	0.000
debt_inc_cent	0.691	0.685	1.009	0.319
verified_incomeSource Verified	2.157	1.418	1.522	0.135
verified_incomeVerified	7.181	1.870	3.840	0.000
annual_income_th_cent	-0.007	0.020	-0.341	0.735
verified_incomeSource Verified:annual_income_th_cent	-0.016	0.026	-0.643	0.523
verified_incomeVerified:annual_income_th_cent	-0.032	0.033	-0.979	0.333

Multiple linear regression (MLR) Prof. Maria Tackett Sep 27, 2023

Multiple linear regression (MLR)
Announcements
Computational setup
Considering multiple variables
Data: Peer-to-peer lender
Variables
Outcome: interest_rate
Predictors
Data manipulation 1: Rescale income
Outcome vs. predictors
Single vs. multiple predictors
Multiple linear regression
Multiple linear regression (MLR)
Multiple linear regression
Multiple linear regression
Model fit
Model equation
Interpreting ${\hat{β}}_{j}$
Prediction
Prediction, revisited
Confidence interval for ${\hat{μ}}_{y}$
Prediction interval for $\hat{y}$
Cautions
Types of predictors
Interpreting results
Mean-centered variables
Mean-centering
Data manipulation 2: Mean-center numeric predictors
Visualize mean-centered predictors
Using mean-centered variables in the model
Original vs. mean-centered model
Indicator variables
Indicator variables
Data manipulation 3: Create indicator variables for verified_income
Indicators in the model
Interpreting verified_income
Interaction terms
Interaction terms
Interest rate vs. annual income
Interaction term in model
Interpreting interaction terms
Data manipulation 4: Create interaction variables
Wrap up
Recap
Looking backward
Looking forward (after Exam 01)
Recipe